Hybrid quantum-classical computer for Bayesian inference with engineered likelihood functions for robust amplitude estimation

ABSTRACT

A hybrid quantum-classical (HQC) computer takes advantage of the available quantum coherence to maximally enhance the power of sampling on noisy quantum devices, reducing measurement number and runtime compared to VQE. The HQC computer derives inspiration from quantum metrology, phase estimation, and the more recent “alpha-VQE” proposal, arriving at a general formulation that is robust to error and does not require ancilla qubits. The HQC computer uses the “engineered likelihood function” (ELF) to carry out Bayesian inference. The ELF formalism enhances the quantum advantage in sampling as the physical hardware transitions from the regime of noisy intermediate-scale quantum computers into that of quantum error corrected ones. This technique speeds up a central component of many quantum algorithms, with applications including chemistry, materials, finance, and beyond.

BACKGROUND

Quantum computers promise to solve industry-critical problems which areotherwise unsolvable or only very inefficiently addressable usingclassical computers. Key application areas include chemistry andmaterials, bioscience and bioinformatics, logistics, and finance.Interest in quantum computing has recently surged, in part due to a waveof advances in the performance of ready-to-use quantum computers.However, near-term quantum devices are still extremely limited inresources, preventing the deployment of quantum computers on problems ofpractical interest.

A recent flurry of methods which cater to the limitations of near-termquantum devices have drawn significant attention. These methods includethe variational quantum eigensolver (VQE), quantum approximateoptimization algorithm (QAOA) and variants, variational quantum linearsystems solver, other quantum algorithms leveraging the variationalprinciples, and quantum machine learning algorithms. In spite of suchalgorithmic innovations, many of these approaches have appeared to beimpractical for commercially-relevant problems owing to their high costin terms of number of measurements and runtime. Yet, methods offering aquadratic speedup in runtime, such as phase estimation, demand quantumresources that are far beyond the reach of near-term devices formoderately large problem instances.

SUMMARY

The number of measurements demanded by hybrid quantum-classicalalgorithms such as the variational quantum eigensolver (VQE) isprohibitively high for many problems of practical value. Quantumalgorithms which reduce this cost (e.g. quantum amplitude and phaseestimation) require error rates that are too low for near-termimplementation. Embodiments of the present invention include hybridquantum-classical (HQC) computers, and methods performed by HQCcomputers, which take advantage of the available quantum coherence tomaximally enhance the power of sampling on noisy quantum devices,reducing measurement number and runtime compared to VQE. Suchembodiments derive inspiration from quantum metrology, phase estimation,and the more recent “alpha-VQE” proposal, arriving at a generalformulation that is robust to error and does not require ancilla qubits.The central object of this method is what we call the “engineeredlikelihood function” (ELF), used for carrying out Bayesian inference.Embodiments of the present invention use the ELF formalism to enhancethe quantum advantage in sampling as the physical hardware transitionsfrom the regime of noisy intermediate-scale quantum computers into thatof quantum error corrected ones. This technique speeds up a centralcomponent of many quantum algorithms, with applications includingchemistry, materials, finance, and beyond.

Other features and advantages of various aspects and embodiments of thepresent invention will become apparent from the following descriptionand from the claims.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a diagram of a quantum computer according to one embodiment ofthe present invention;

FIG. 2A is a flowchart of a method performed by the quantum computer ofFIG. 1 according to one embodiment of the present invention;

FIG. 2B is a diagram of a hybrid quantum-classical computer whichperforms quantum annealing according to one embodiment of the presentinvention;

FIG. 3 is a diagram of a hybrid quantum-classical computer according toone embodiment of the present invention;

FIG. 4 is a diagram of a hybrid quantum-classical (HQC) computer forperforming quantum amplitude estimation according to one embodiment ofthe present invention;

FIGS. 5A-5C illustrate quantum circuits of standard sampling and of someembodiments of the present invention, along with their correspondinglikelihood functions;

FIGS. 6A-6B illustrate plots showing the dependence of the Fisherinformation on various likelihood functions;

FIG. 7 illustrates operations used for generating samples thatcorrespond to an engineered likelihood function according to oneembodiment of the present invention;

FIG. 8 illustrates an algorithm implemented by embodiments of thepresent invention;

FIGS. 9A-9B, 10, 11A-11B, and 12 illustrate various algorithms performedby embodiments of the present invention;

FIG. 13 is a plot of true and fitted likelihood functions according tovarious embodiments of the present invention;

FIGS. 14A-14B, 15A-15B, 16A-16B, 17A-17B, and 18A-18B show plotsillustrating performance of various embodiments of the presentinvention;

FIG. 19 shows the

factors of various embodiments of the present invention;

FIGS. 20A-20B and 21A-21B show plots illustrating performance of variousembodiments of the present invention;

FIGS. 22A-22B show the

factors of various embodiments of the present invention;

FIGS. 23A-23B and 24A-24B show plots illustrating performance of variousembodiments of the present invention;

FIGS. 25A-25B show the

factors of various embodiments of the present invention;

FIG. 26 shows plots illustrating the runtime to target accuracy ofvarious embodiments of the present invention;

FIGS. 27-28 show quantum circuits implemented according to embodimentsof the present invention;

FIGS. 29A-29B, 30A-30B, 31A-31B and 32 show algorithms implementedaccording to embodiments of the present invention; and

FIG. 33 shows true and fitted likelihood functions according toembodiments of the present invention.

DETAILED DESCRIPTION

Embodiments of the present invention are directed to a hybridquantum-classical (HQC) computer which performs quantum amplitudeestimation. Referring to FIG. 4 , a flow diagram is shown of a HQC 430,including both a quantum computer 432 and a classical computer 434,which performs a method of quantum amplitude estimation according to oneembodiment of the present invention. In the block 404, which isperformed with the classical computer 434, a plurality ofquantum-circuit-parameter values is selected to optimize anaccuracy-improvement rate of a statistic estimating an expectation value

s|P|s

of an observable P 400 with respect to a quantum state |

402.

In embodiments, the statistic is a sample mean calculated from aplurality of values sampled from a random variable. In the presentdiscussion, these sampled values are obtained by measuring qubits of thequantum computer 432. However, the statistic may alternatively be askewness, kurtosis, quantile, or another type of statistic withoutdeparting from the scope hereof. The statistic is an estimator of theexpectation value

s|P|z

, and may be either biases or unbiased. The plurality of values may bemodeled according to a probability distribution, in which case thestatistic may represent a parameter of the probability distribution. Forexample, the statistic may represent a mean of a Gaussian distribution,as described in more detail below in Section 3.2.

The quantum-circuit parameter values are real numbers that control howquantum gates operate on qubits. In the present discussion, each quantumcircuit can be represented as a sequence of quantum gates in which eachquantum gate of the sequence is controlled by one of the quantum-circuitparameter values. For example, each of the quantum-circuit parametersvalue may represent an angle by which the state of one or more qubits isrotated in a corresponding Hilbert space.

The accuracy-improvement rate is a function that expresses by how much acorresponding accuracy of the statistic improves with each iteration ofthe present method embodiments. The accuracy-improvement rate is afunction of the quantum-circuit parameters, and may additionally be afunction of the statistic (e.g., the mean). The accuracy is anyquantitative measure of an error of the statistic. For example, theaccuracy may be a mean squared error, standard deviation, variance, meanabsolute error, or another moment of the error. Alternatively, theaccuracy may be an information metric, such as Fisher information orinformation entropy. Examples that use variance for the accuracy aredescribed in more detail below (e.g., see Eqn. 36). In these examples,the accuracy-information rate may be the variance reduction factorintroduced in Eqn. 38. Alternatively, the accuracy-information rate maybe Fisher information (e.g., see Eqn. 42). However, theaccuracy-improvement rate may be another function quantifyingimprovements in the accuracy without departing from the scope hereof.

In some embodiments, the plurality of quantum-circuit-parameter valuesis selected using one of coordinate ascent and gradient descent. Both ofthese techniques are described in more detail in Section 4.1.1.

In the block 406, which is performed with the quantum computer 432, asequence of alternating first and second generalized reflectionoperators is applied to one or more qubits of the quantum computer 432to transform the one or more qubits from the quantum state |s

into a reflected quantum state. Each of the first and second generalizedreflection operators are controlled according to a corresponding one ofthe plurality of quantum-circuit-parameter values. The operators U(x)and V(y) described in Section 3.1 are examples of the first and secondgeneralized reflection operators, respectively. The operator Q({rightarrow over (x)}) introduced in Eqn. 19 is one example of a sequence ofalternative first and second generalized reflection operators, where thevector 2 represents the plurality of quantum-circuit-parameter values.The sequence of first and second generalized reflection operators andthe observable P may define a bias of an engineered likelihood function,as described below with respect to Eqn. 26.

In the block 408, which is also performed with the quantum computer 432,the plurality of qubits in the reflected quantum state are measured withrespect to the observable P to obtain a set of measurement outcomes. Inthe block 410, which is performed on the classical computer 434, thestatistic is updated with the set of measurement outcomes to obtain anestimate of

s|P|s

with higher accuracy.

The method may further include outputting the statistic after saidupdating. Alternatively, the method may iterate over the blocks 404,406, 408, and 410. In some embodiments, the method further includesupdating, on the classical computer 434 and with the set of measurementoutcomes, an accuracy estimate of the statistic. The accuracy estimateis the calculated value of the accuracy described above (e.g.,variance). In these embodiments, the method iterates over the blocks404, 406, 408, and 410 until the accuracy estimate falls below athreshold.

In some embodiments, the statistic is updated in the block 410 byupdating a prior distribution with the plurality of measurements toobtain a posterior distribution, and calculating an updated statisticfrom the posterior distribution.

In some embodiments, the plurality of quantum-circuit-parameter valuesis selected based on the statistic and an accuracy estimate of thestatistic. The plurality of quantum-circuit-parameter values may furtherbe selected based on a fidelity representing errors occurring duringsaid applying and measuring.

1.1. Introduction

The combination of phase estimation and the Bayesian perspective givesrise to Bayesian phase estimation techniques that are more suitable fornoisy quantum devices capable of realizing limited depth quantumcircuits than the early proposals. Adopting the notation from above, thecircuit parameters {right arrow over (θ)}=(m, β) and the goal is toestimate the phase Π in an eigenvalue e^(i arcos Π) of the operator U.An important note is that the likelihood function here,

$\begin{matrix}{{{p\left( {{d❘m},\beta,\Pi} \right)} = {\frac{1}{2}\left( {1 + {\left( {- 1} \right)^{d}\left\lbrack {{\cos(\beta){\mathcal{T}_{m}(\Pi)}} + {\sin(\beta){\mathcal{U}_{m}(\Pi)}}} \right\rbrack}} \right)}},} & (1)\end{matrix}$where

_(m) and

_(m) are Chebyshev polynomials of first and second kind respectively, isshared in many settings beyond Bayesian phase estimation. Thiscommonality makes the Bayesian inference machinery used for tasks suchas Hamiltonian characterization relevant to phase estimation. In theexponential advantage of Bayesian inference with a Gaussian prior overother non-adaptive sampling methods is established by showing that theexpected posterior variance σ decays exponentially in the number ofinference steps. Such exponential convergence is at a cost of O(1/σ)amount of quantum coherence required at each inference step. Suchscaling is also confirmed in the context of Bayesian phase estimation.

Equipped with the techniques of Bayesian phase estimation as well as theperspective of overlap estimation as an amplitude estimation problem,one may devise a Bayesian inference method for operator measurement thatsmoothly interpolates between the standard sampling regime and phaseestimation regime. This is proposed as “α-VQE”, where the asymptoticscaling for performing an operator measurement is O(1/ε^(α)) with theextremal values of α=2 corresponding to the standard sampling regime(typically realized in VQE) and α=1 corresponding to thequantum-enhanced regime where the scaling reaches the Heisenberg limit(typically realized with phase estimation). By varying the parametersfor the Bayesian inference one can also achieve a values between 1 and2. The lower the α value, the deeper the quantum circuit needed forBayesian phase estimation. This accomplishes the trade-off betweenquantum coherence and asymptotic speedup for the measurement process.

It is also worth noting that phase estimation is not the only paradigmthat can reach the Heisenberg limit for amplitude estimation. Inprevious works, the authors consider the task of estimating theparameter θ of a quantum state ρ_(θ). A parallel strategy is proposedwhere m copies of the parametrized circuit for generating ρ_(θ),together with an entangled initial state and measurements in anentangled basis, are used to create states with the parameter θamplified to mθ. Such amplification can also give rise to likelihoodfunctions that are similar to that in Equation 1. In previous work it isshown that with randomized quantum operations and Bayesian inference onecan extract information in fewer iteration than classical sampling evenin the presence of noise. In quantum amplitude estimation circuits withvarying numbers m of iterations and numbers N of measurements areconsidered. A particularly chosen set of pairs (m, N) gives rise to alikelihood function that can be used for inferring the amplitude to beestimated. Heisenberg limit is demonstrated for one particularlikelihood function construction given by the authors. Both workshighlight the power of parametrized likelihood functions, making ittempting to investigate their performance under imperfect hardwareconditions.

1.2. Main results Embodiments of the present invention include systemsand methods for estimating the expectation Π=

A|O|A

where the state |A

can be prepared by a quantum circuit A such that |A

=A|0

. Embodiments of the present invention may use a family of quantumcircuits, such that, as the circuit deepens with more repetitions of A,it allows for likelihood functions that are polynomial in Π of everhigher degree. As described in the next section with a concrete example,a direct consequence of this increase in polynomial degree is anincrease in the power of inference, which can be quantified by Fisherinformation gain at each inference step. After establishing this“enhanced sampling” technique, embodiments of the present invention mayintroduce parameters into the quantum circuit and render the resultinglikelihood function tunable. Embodiments of the present invention mayoptimize the parameters for maximal information gain during each step ofinference. The following lines of insight emerge from our efforts:

-   -   1. The role of noise and error in amplitude estimation: Previous        works have revealed the impact of noise on the likelihood        function and the output estimation of the Hamiltonian spectrum.        The disclosure herein investigates the same for schemes of        amplitude estimation used by embodiments of the present        invention. The description herein demonstrates that while noise        and error does increase the runtime needed for producing an        output that is within a specific statistical error tolerance,        they do not necessarily introduce systematic bias in the output        of the estimation algorithm. Systematic bias in the estimate can        be suppressed by using active noise-tailoring techniques and        calibrating the effect of noise.

Simulation using realistic error parameters for near-term devices hasrevealed that the enhanced sampling scheme can outperform VQE in termsof sampling efficiency. Experimental results have also revealed aperspective on tolerating error in quantum algorithm implementationwhere higher fidelity does not necessarily lead to better algorithmicperformance. In certain embodiments of the present invention, there isan optimal circuit fidelity of roughly 0.6 at which the enhanced schemeyields the maximum amount of quantum speedup.

-   -   1. The role of likelihood function tunability: Parametrized        likelihood functions may be used in phase estimation or        amplitude estimation routines. To our knowledge, all of the        current methods focus on likelihood functions of the Chebyshev        form (Equation 1). For these Chebyshev likelihood functions        (CLF), in the presence of noise there are specific values of the        parameter Π(the “dead spots”) for which the CLFs are        significantly less effective for inference than other values of        Π. Embodiments of the present invention may remove those dead        spots by engineering the form of the likelihood function with        generalized reflection operators whose angle parameters are made        tunable.    -   2. Runtime model for estimation as error rates decrease:        Previous works have demonstrated smooth transitions in the        asymptotic cost scaling from the O(1/ε²) of VQE to O(1/ε) of        phase estimation. Embodiments of the present invention advance        this line of thinking by developing a model for estimating the        runtime t_(ε) to target accuracy E using devices with degree of        noise λ (c.f Section 6):

$\begin{matrix}{t_{\varepsilon} \sim {{\mathcal{O}\left( {\frac{\lambda}{\varepsilon^{2}} + \frac{1}{\sqrt{2}\varepsilon} + \sqrt{\left( \frac{\lambda}{\varepsilon^{2}} \right)^{2} + \left( \frac{2\sqrt{2}}{\varepsilon} \right)^{2}}} \right)}.}} & (2)\end{matrix}$

-   -   3. The model interpolates between the O(1/ε) scaling and O(1/ε²)        scaling as a function of A. Such bounds also allow embodiments        of the present invention to be analyzed for quantum speedup as a        function of hardware specifications such as the number of qubits        and two-qubit fidelity, and therefore estimate runtimes using        realistic parameters for current and future hardware.

Subsequent sections of the present disclosed are organized as follows.Section 2 presents a concrete example of a scheme implemented accordingto one embodiment of the present invention. Subsequent sections thenexpand on the general formulation of this scheme according to variousembodiments of the present invention. Section 3 describes in detail thegeneral quantum circuit construction for realizing ELFs, and analyzesthe structure of ELF in both noisy and noiseless settings. In additionto the quantum circuit scheme, embodiments of the present invention alsoinvolve 1) tuning the circuit parameter to maximize information gain,and 2) Bayesian inference for updating the current belief about thedistribution of the true value of Π. Section 4 presents heuristicalgorithms for both. Numerical results are presented in Section 5,comparing embodiments of the present invention with existing methodsbased on CLFs. Section 6 discloses a runtime model and derives theexpression in (2). Section 7 discloses implications of the disclosedresults from a broad perspective of quantum computing.

TABLE 1 Comparison of our scheme with relevant proposals that appear inthe literature. Here the list of features include whether the quantumcircuit used in the scheme requires ancilla qubits in addition to qubitsholding the state for overlap estimation, whether the scheme usesBayesian inference, whether any noise resilience is considered, whetherthe initial state is required to be an eigenstate, and whether thelikelihood function is fully tunable like ELF proposed here orrestricted to Chebyshev likelihood functions. Bayesian Noise FullyRequires Requires Scheme inference consideration tunable LFs ancillaeigenstate Knill et al. No No No Yes No Svore et al. No No No Yes YesWiebe and Yes Yes No Yes Yes Grenade Wang et al. Yes Yes No Yes YesO'Brien et al. Yes Yes No Yes No Zintchenko No Yes No No No and WiebeSuzuki et al. No No No No No This work Yes Yes Yes No No (Section 1)This work Yes Yes Yes Yes No (Appendix A)2. A First Example

There are two main strategies for estimating the expectation value

A|P|A

of some operator P with respect to a quantum state |A

. The method of quantum amplitude estimation provides a provable quantumspeedup with respect to certain computational models. However, toachieve precision E in the estimate, the circuit depth needed in thismethod scales as O(1/ε), making it impractical for near term quantumcomputers. The variational quantum eigensolver uses standard sampling tocarry out amplitude estimation. Standard sampling allows for low-depthquantum circuits, making it more amenable to implementation on near termquantum computers. However, in practice, the inefficiency of this methodmakes VQE impractical for many problems of interest. This sectionintroduces a method of enhanced sampling for amplitude estimation, whichmay be used by embodiments of the present invention. This techniqueseeks to maximize the statistical power of noisy quantum devices. Thismethod is described as starting from a simple analysis of standardsampling as used in VQE.

The energy estimation subroutine of VQE estimates amplitudes withrespect to Pauli strings. For a Hamiltonian decomposed into a linearcombination of Pauli strings H=Σ_(j)μ_(j)P_(j) and “ansatz state”|A

, the energy expectation value is estimated as a linear combination ofPauli expectation value estimates

$\begin{matrix}{{\hat{E} = {\sum\limits_{j}{\mu_{j}{\hat{\Pi}}_{j}}}},} & (3)\end{matrix}$where {circumflex over (Π)}_(j) is the (amplitude) estimate of

A|P_(j)|A

. VQE uses the standard sampling method to build up Pauli expectationvalue estimates with respect to the ansatz state, which can besummarized as the following:Standard Sampling:

-   -   1. Prepare |A        and measure operator P receiving outcome d=0, 1.    -   2. Repeat M times, receiving k outcomes labeled 0 and M−k        outcomes labeled 1.    -   3. Estimate Π=        A|P|A        as

$\hat{\Pi} = {\frac{k - \left( {M - k} \right)}{M}.}$

The performance of this estimation strategy may be quantified using themean squared error of the estimator as a function of time t=TM, where Tis the time cost of each measurement. Because the estimator is unbiased,the mean squared error is simply the variance in the estimator,

$\begin{matrix}{{{MSE}\left( \hat{\Pi} \right)} = {\frac{1 - \Pi^{2}}{M}.}} & (4)\end{matrix}$

For a specific mean squared error MSE({circumflex over (Π)})=ε², theruntime of the algorithm needed to ensure mean squared error ε² is

$\begin{matrix}{t_{\varepsilon} = {T{\frac{1 - \Pi^{2}}{\varepsilon^{2}}.}}} & (5)\end{matrix}$

The total runtime of energy estimation in VQE is the sum of the runtimesof the individual Pauli expectation value estimation runtimes. Forproblems of interest, this runtime may be far too costly, even whencertain parallelization techniques are used. The source of this cost isthe insensitivity of the standard sampling estimation process to smalldeviations in the expected information gain about Π contained in thestandard-sampling measurement outcome data is low.

Generally, we can measure the information gain of an estimation processof M repetitions of standard sampling with the Fisher information

$\begin{matrix}\begin{matrix}{{I_{M}(\Pi)} = {{\mathbb{E}}_{D}\left\lbrack \left( {\frac{\partial}{\partial\Pi}\log\left( {D❘\Pi} \right)} \right)^{2} \right\rbrack}} \\{= {- {{\mathbb{E}}_{D}\left\lbrack {\frac{\partial^{2}}{\partial\Pi^{2}}\log\left( {D❘\Pi} \right)} \right\rbrack}}} \\{{= {\sum\limits_{D}{\left( {\frac{\partial}{\partial\Pi}\left( {D❘\Pi} \right)} \right)^{2}}}},}\end{matrix} & (6)\end{matrix}$where D={d₁, d₂, . . . , d_(M)} is the set of outcomes from Mrepetitions of the standard sampling. The Fisher information identifiesthe likelihood function

(D|Π) as being responsible for information gain. A lower bound the meansquared error of an (unbiased) estimator may be obtained with theCramer-Rao bound

$\begin{matrix}{{{MSE}\left( \hat{\Pi} \right)} \geq {\frac{1}{I_{M}(\Pi)}.}} & (7)\end{matrix}$

Using the fact that the Fisher information is additive in the number ofsamples, we have I_(M)=(Π)=MI₁(Π) where I₁(Π)=1/(1−Π²) is the Fisherinformation of a single sample drawn from likelihood function

(d|Π)=(1+(−1)^(d)Π)/2. Using the Cramer-Rao bound, a lower bound may befound for the runtime of the estimation process as

$\begin{matrix}{{t_{\varepsilon} \geq \frac{T}{{I_{1}(\Pi)}\varepsilon^{2}}},} & (8)\end{matrix}$which shows that in order to reduce the runtime of an estimationalgorithm, embodiments of the present invention may increase the Fisherinformation.

One purpose of enhanced sampling is to reduce the runtime of overlapestimation by engineering likelihood functions which increase the rateof information gain. Consider the simplest case of enhanced sampling,which is illustrated in FIGS. 5A-5C. To generate data, embodiments ofthe present invention may prepare the ansatz state |A

, apply the operation P, apply a phase flip about the ansatz state, andthen measure P. The phase flip about the ansatz state may be achieved byapplying the inverse of the ansatz circuit A⁻¹, applying a phase flipabout the initial state R₀=2|0^(N)

0 ^(N)|−I, and then re-applying the ansatz circuit A. In this case, thelikelihood function becomes

$\begin{matrix}\begin{matrix}{{\left( {d❘\Pi} \right)} = \frac{1 + {\left( {- 1} \right)^{d}{\cos\left( {3{arc}{\cos(\Pi)}} \right)}}}{2}} \\{= {\frac{1 + {\left( {- 1} \right)^{d}\left( {{4\Pi^{3}} - {3\Pi}} \right)}}{2}.}}\end{matrix} & (9)\end{matrix}$

The bias is a degree-3 Chebyshev polynomial in Π. The disclosure hereinwill refer to such likelihood functions as Chebyshev likelihoodfunctions (CLFs).

In order to compare the Chebyshev likelihood function of enhancedsampling to that of standard sampling, consider the case of Π=0. Here,

(0|Π=0)=

(1|Π=0) and so the Fisher information is proportional to the square ofthe slope of the likelihood function

$\begin{matrix}{{I_{1}\left( {\Pi = 0} \right)} = {\left( \frac{\partial\left( {d = {0❘\Pi}} \right)}{\partial\Pi} \right)^{2}.}} & (10)\end{matrix}$

As seen in FIG. 5B, the slope of the Chebyshev likelihood function atΠ=0 is steeper than that of the standard sampling likelihood function.The single sample Fisher information in each case evaluates toStandard:I ₁(Π=0)=1Enhanced:I ₁(Π=0)=9,  (11)demonstrating how a simple variant of the quantum circuit may enhanceinformation gain. In this example, using the simplest case of enhancedsampling may reduce the number of measurements needed to achieve atarget error by at least a factor of nine. As will be discussed later,embodiments of the present invention may further increase the Fisherinformation by applying L layers of PA^(†)R₀A before measuring P. Infact, the Fisher information

${I_{1}(\Pi)} = {\frac{\left( {{2L} + 1} \right)^{2}}{1 - \Pi^{2}} = {0\left( L^{2} \right)}}$grows quadratically in L.

Not yet described is an estimation scheme that converts enhancedsampling measurement data into an estimation. One intricacy thatenhanced sampling introduces is the option to vary L as embodiments ofthe present invention are collecting measurement data. In this case,given a set of measurement outcomes from circuits with varying L, thesample mean of the 0 and 1 counts loses its meaning. Instead of usingthe sample mean, to process the measurement outcomes into informationabout Π embodiments of the present invention may use Bayesian inference.Section 2 describes certain embodiments which use Bayesian inference forestimation.

At this point, one may be tempted to point out that the comparisonbetween standard sampling and enhanced sampling is unfair because onlyone query to A is used in the standard sampling case while the enhancedsampling scheme uses three queries of A. It seems that if one considersa likelihood function that arises from three standard sampling steps,one could also yield a cubic polynomial form in the likelihood function.Indeed, suppose one performs three independent standard sampling stepsyielding results x₁, x₂, x₃∈{0, 1}, and produces a binary outcome z∈{0,1} classically by sampling from a distribution

(z|x₁, x₂, x₃). Then the likelihood function takes the form of

$\begin{matrix}\begin{matrix}{{\left( {z❘\Pi} \right)} = {\sum\limits_{x_{1},x_{2},x_{3}}{\left( {{z❘x_{1}},x_{2},x_{3}} \right)\left( {x_{1},x_{2},{x_{3}❘\Pi}} \right)}}} \\{{= {\sum\limits_{i = 0}^{3}{{\alpha_{i}\begin{pmatrix}3 \\i\end{pmatrix}}\left( \frac{1 + \Pi}{2} \right)^{i}\left( \frac{1 - \Pi}{2} \right)^{3 - i}}}},}\end{matrix} & (12)\end{matrix}$where each α_(i)∈[0, 1] is a parameter that can be tuned classicallythrough changing the distribution

(z|x₁, x₂, x₃). More specifically, α_(i)=Σ_(x) ₁ _(x) ₂ _(x) ₃ _(:h(x) ₁_(x) ₂ _(x) ₃ _()=i)

(z|x₁, x₂, x₃) where h(x₁x₂x₃) is the Hamming weight of the bit stringx₁x₂x₃. Suppose it is desired that

(z=0|Π) to be equal to

(d=0|Π) in Equation 9. This implies that α₀=1, α₁=2, α₂=3 and α₃=0,which is clearly beyond the classical tunability of the likelihoodfunction in Equation 12. This is an evidence which suggests that thelikelihood function arising from the quantum scheme in Equation 9 isbeyond classical means.

As the number of circuit layers L is increased, the time per sample Tgrows linearly in L. This linear growth in circuit layer number, alongwith the quadratic growth in Fisher information leads to a lower boundon the expected runtime,

$\begin{matrix}{{t_{\varepsilon} \in {\Omega\left( \frac{1}{L\varepsilon^{2}} \right)}},} & (13)\end{matrix}$assuming a fixed-L estimation strategy with an unbiased estimator. Inpractice, the operations implemented on the quantum computer are subjectto error. Fortunately, embodiments of the present invention may useBayesian inference, which can incorporate such errors into theestimation process. As long as the influence of errors on the form ofthe likelihood function is accurately modeled, the principal effect ofsuch errors is only to slow the rate of information gain. Error in thequantum circuit accumulates as the number of circuit layers L isincreased. Consequently, beyond a certain number of circuit layers,diminishing returns will be received with respect to gains in Fisherinformation (or the reduction in runtime). The estimation algorithm maythen seek to balance these competing factors in order to optimize theoverall performance.

The introduction of error poses another issue for estimation. Withouterror, the Fisher information gain per sample in the enhanced samplingcase with L=1 is greater than or equal to 9 for all Π. As shown in FIGS.6A-6B, with the introduction of even a small degree of error, the valuesof Π where the likelihood function is flat incur a dramatic drop inFisher information. Such regions are referred to herein as estimationdead spots. This observation motivates the concept of engineeringlikelihood functions (ELF) to increase their statistical power. Bypromoting the P and R₀ operations to generalized reflectionsU(x₁)=exp(−ix₁P) and R₀(y₂)=exp(−ix₂R₀) embodiments of the presentinvention may use rotation angles such that the information gain isboosted around such dead spots. Even for deeper enhanced samplingcircuits, engineering likelihood functions allows embodiments of thepresent invention to mitigate the effect of estimation dead spots.

3. Engineered Likelihood Functions

This section describes a methodology of engineering likelihood functionsfor amplitude estimation which may be used by embodiments of the presentinvention. First, quantum circuits for drawing samples that correspondto engineered likelihood functions are described, and then techniquesfor tuning the circuit parameters and carry out Bayesian inference withthe resultant likelihood functions are described.

3.1. Quantum Circuits for Engineered Likelihood Functions

Techniques will now be described for designing, implementing, andexecuting a procedure on a computer (e.g., a quantum computer or ahybrid quantum-classical computer) for estimating the expectation valueΠ=cos(θ)=

A|P|A

,  (18)where |A

=A|0^(n)

in which A is an n-qubit unitary operator, P is an n-qubit Hermitianoperator with eigenvalues ±1, and θ=arccos (Π) is introduced tofacilitate Bayesian inference later on. In constructing the estimationalgorithms disclosed herein, it may be assumed that embodiments of thepresent invention are able to perform the following primitiveoperations. First, embodiments of the present invention may prepare thecomputational basis state |0^(n)

and apply a quantum circuit A to it, obtaining |A

=A|0

^(⊗n). Second, embodiments of the present invention implement theunitary operator U(x)=exp(−ixP) for any angle x∈

. Finally, embodiments of the present invention perform the measurementof P, which is modeled as a projection-valued measure

$\left\{ {\frac{I + P}{2},\frac{I - P}{2}} \right\}$with respective outcome labels {0, 1}. Embodiments of the presentinvention may also make use of the unitary operator V(y)=AR₀(y)A^(†),where R₀(y)=exp(−iy(2|0^(n)

0^(n)−I)) and y∈

. Following the convention, U(x) and V(y) will be referred to herein asthe generalized reflections about the +1 eigenspace of P and the state|A

, respectively, where x and y are the angles of these generalizedreflections, respectively.

Embodiments of the present invention may use the ancilla-free¹ quantumcircuit in FIG. 7 to generate the engineered likelihood function (ELF),which is the probability distribution of the outcome d∈{0, 1} given theunknown quantity θ to be estimated. The circuit may, for example,include a sequence of generalize reflections. Specifically, afterpreparing the ansatz state |A

=A|0

^(⊗n), embodiments of the present invention may apply 2L generalizedreflections U(x₁), V(x₂), . . . , U(x_(2L−1)), V(x_(2L)) to it, varyingthe rotation angle x_(j) in each operation. For convenience,V(x_(2j))U(x_(2j−1)) will be referred to herein as the j-th layer of thecircuit, for j=1, 2, . . . , L. The output state of this circuit isQ({right arrow over (x)})≡A

=V(x _(2L))U(x _(2L−1)) . . . V(x ₂)U(x _(j))|A

,  (19)where {right arrow over (x)}=(x₁, x₂, . . . , x_(2L−1), x_(2L))∈

^(2L) is the vector of tunable parameters. Finally, embodiments of thepresent invention may perform the projective measurement

$\left\{ {\frac{I + P}{2},\frac{I - P}{2}} \right\}$on this state, receiving an outcome d∈{0, 1}. We call this scheme“ancilla-free” (AF) since it does not involve any ancilla qubits. InAppendix A, we consider a different scheme named the “ancilla-based”(AB) scheme which involves one ancilla qubit.

As in Grover's algorithm, the generalized reflections U(x_(2j−1)) andV(x_(2j)) ensure that the quantum state remains in two-dimensionalsubspace S:=span{|A

,P|A

}² for any j. Let |A^(#)) be the state (unique, up to a phase) in S thatis orthogonal to |A

, i.e.

$\begin{matrix}{\left. ❘A^{\bot} \right\rangle = {\frac{{P\left. ❘A \right\rangle} - {\left\langle A❘ \right.P\left. ❘A \right\rangle\left. ❘A \right\rangle}}{\sqrt{1 - {\left\langle A❘ \right.P\left. ❘A \right\rangle^{2}}}}.}} & (20)\end{matrix}$

To help the analysis, we will view this two-dimensional subspace as aqubit, writing |A

and |A^(#)

as |0

and |1

respectively. ²To ensure that S is two-dimensional, assume that Π≠±1,i.e. θ≠0 or π.

Let X, Y, Z and Ī be the Pauli operators and identity operator on thisvirtual qubit respectively. Then, focusing on the subspace S=span{|0

, |1

}, we can rewrite P asP(θ)=cos(θ) Z +sin(θ) X,  (21)and rewrite the generalized reflections U(x_(2j−1)) and V(x_(2j)) asU(θ;x _(2j−1))=cos(x _(2j−1))Ī−1 sin(x _(2j−1))(cos(θ) Z +sin(θ) X)  (22)andV(x _(2j))=cos(x _(2j))Ī−i sin(x _(2j)) Z,  (23)where x_(2j−1), x_(2j)∈

are tunable parameters. Then the unitary operator Q implemented by theL-layer circuit becomesQ(θ;{right arrow over (x)})=V(x _(2L))U(θ;x _(2L−1)) . . . V(x ₂)U(θ;x₁).  (24)

Note that in this picture, |A

=|0) is fixed, while P=P(θ), U(x)=U(θ; x) and Q({right arrow over(x)})=Q(θ; {right arrow over (x)}) depend on the unknown quantity θ. Itturns out to be more convenient to design and analyze the estimationalgorithms in this “logical” picture than in the original “physical”picture. Therefore, this picture will be used for the remainder of thisdisclosure.

The engineered likelihood function (i.e. the probability distribution ofmeasurement outcome d∈{0, 1}) depends on the output state ρ(θ; {rightarrow over (x)}) of the circuit and the observable P(θ).

Precisely, it is

$\begin{matrix}{{\left( {\left. d \middle| \theta \right.;\overset{\rightarrow}{x}} \right) = \frac{1 + {\left( {- 1} \right)^{d}{\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)}}}{2}},{where}} & (25)\end{matrix}$ $\begin{matrix}{{\Delta\left( {\theta;\overset{\rightarrow}{x}} \right\rangle} = \left\langle \overset{\_}{0} \middle| {{Q\left( {\theta;\overset{\rightarrow}{x}} \right\rangle}^{\dagger}{P(\theta)}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}} \middle| \overset{\_}{0} \right\rangle} & (26)\end{matrix}$is the bias of the likelihood function (from now on, we will use

′(d|θ; {right arrow over (x)}) and Δ′(θ; {right arrow over (x)}) todenote the derivatives of

(d|θ; {right arrow over (x)}) and Δ(θ; {right arrow over (x)}) withrespect to θ, respectively). In particular, if

${\overset{\rightarrow}{x} = \left( {\frac{\pi}{2},\frac{\pi}{2},\ldots,\frac{\pi}{2},\frac{\pi}{2}} \right)},$then we have Δ(θ; {right arrow over (x)})=cos ((2L+1)θ). Namely, thebias of the likelihood function for this {right arrow over (x)} is theChebyshev polynomial of degree 2L+1 (of the first kind) of Π. For thisreason, the likelihood function for this {right arrow over (x)} will bereferred to herein as the Chebyshev likelihood function (CLF). Section 5will explore the performance gap between CLFs and general ELFs.

In reality, quantum devices are subject to noise. To make the estimationprocess robust against errors, embodiments of the present invention mayincorporate the following noise model into the likelihood function.

In practice, the establishment of the noise model may leverage aprocedure for calibrating the likelihood function for the specificdevice being used. With respect to Bayesian inference, the parameters ofthis model are known as nuisance parameters; the target parameter doesnot depend directly on them, but they determine how the data relates tothe target parameter and, hence, may be incorporated into the inferenceprocess. The remainder of this disclosure will assume that the noisemodel has been calibrated to sufficient precision so as to render theeffect of model error negligible.

Assume that the noisy version of each circuit layer V(x_(2j))U(θ;x_(2j−1)) implements a mixture of the target operation and thecompletely depolarizing channel³ acting on the same input state, i.e.

$\begin{matrix}{{{\mathcal{U}_{j}(\rho)} = {{{{pV}\left( x_{2j} \right)}{U\left( {\theta;x_{{2j} - 1}} \right)}\rho{U\left( {\theta;x_{{2j} - 1}} \right)}^{\dagger}{V\left( x_{2j} \right)}^{\dagger}} + {\left( {1 - p} \right)\frac{I}{2^{n}}}}},} & (27)\end{matrix}$where p is the fidelity of this layer. Under composition of suchimperfect operations, the output state of the L-layer circuit becomes

$\begin{matrix}{\rho_{L} = {{p^{L}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}\left. ❘A \right\rangle\left\langle A❘ \right.{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}^{\dagger}} + {\left( {1 - p^{L}} \right){\frac{I}{2^{n}}.}}}} & (28)\end{matrix}$

This imperfect circuit is preceded by an imperfect preparation of |A

and followed by an imperfect measurement of P. In the context ofrandomized benchmarking, such errors are referred to as statepreparation and measurement (SPAM) errors. Embodiments of the presentinvention may also model SPAM error with a depolarizing model, takingthe noisy preparation of |A

to be

${p_{SP}\left. A \right\rangle\left\langle A \right.} + {\left( {1 - p_{SP}} \right)\frac{I}{2^{n}}}$and taking the noisy measurement of P to be the POVM

$\left\{ {{{p_{M}\frac{I + P}{2}} + {\left( {1 - p_{M}} \right)\frac{I}{2}}},{{p_{M}\frac{I - P}{2}} + {\left( {1 - p_{M}} \right)\frac{I}{2}}}} \right\}.$Combining the SPAM error parameters into p=p_(SP)p_(M), arrives at amodel for the noisy likelihood function

(d|θ;ƒ,{right arrow over (x)})=½[1+(−1)^(d)ƒΔ(θ;{right arrow over(x)})],  (29)where ƒ=pp^(L) is the fidelity of the whole process for generating theELF, and Δ(θ, {right arrow over (x)}) is the bias of the ideallikelihood function as defined in Eq. (26) (from now on, this disclosurewill use

′(d|θ; ƒ, ěcx) to denote the derivative of

(d|θ; ƒ, {right arrow over (x)}) with respect to θ). Note that theoverall effect of noise on the ELF is that it rescales the bias by afactor of ƒ. This implies that the less errored the generation processis, the steeper the resultant ELF is (which means it is more useful forBayesian inference), as one would expect. ³The depolarizing modelassumes that the gates comprising each layer are sufficiently random toprevent systematic build-up of coherent error. There exist techniquessuch as randomized compiling which make this depolarizing model moreaccurate.

Before moving on to the discussion of Bayesian inference with ELFs, itis worth mentioning the following property of engineered likelihoodfunctions, as it will play a role in Section 4. The concepts oftrigono-multilinear and trigono-multiquadratic functions are known.Basically, a multivariable function ƒ:

^(k)→

is trigono-multilinear if for any j∈{1, 2, . . . , k}, ƒ(x₁, x₂, . . . ,x_(k)) may be written asƒ(x ₁ ,x ₂ , . . . ,x _(k))=C _(j)({right arrow over (x)} _(¬j))cos(x_(j))+S _(j)({right arrow over (x)} _(¬j))sin(x _(j)),  (30)for some (complex-valued) functions C_(j) and S_(j) of {right arrow over(x)}_(¬j):=(x₁, . . . , x_(j−1), x_(j+1), x_(k)), and we call C_(j) andS_(j) the cosine-sine-decomposition (CSD) coefficient functions of ƒwith respect to x_(j). Similarly, a multivariable function ƒ:

^(k)→

is trigono-multiquadratic if for any j∈{1, 2, . . . , k}, ƒ(x₁, x₂, . .. , x_(k)) may be written as

$\begin{matrix}{{{f\left( {x_{1},x_{2},\ldots\;,x_{k}} \right)} = {{{C_{j}\left( {\overset{\rightarrow}{x}}_{⫬ j} \right)}\mspace{14mu}\cos\mspace{14mu}\left( {2x_{j}} \right)} + {{S_{j}\left( {\overset{\rightarrow}{x}}_{⫬ j} \right)}\mspace{14mu}\sin\mspace{14mu}\left( {2x_{j}} \right)} + {B_{j}\left( {\overset{\rightarrow}{x}}_{⫬ j} \right)}}},} & (31)\end{matrix}$for some (complex-valued) functions C_(j), S_(j) and B_(j) of {rightarrow over (x)}_(¬j):=(x₁, . . . , x_(j−1), x_(j+1), x_(k)), and we callC_(j), S_(j) and B_(j) the cosine-sine-bias-decomposition (CSBD)coefficient functions of ƒ with respect to x_(j). The concepts oftrigono-multilinearity and trigono-multiquadraticity can be alsonaturally generalized to linear operators. Namely, a linear operator istrigono-multilinear (or trigono-multiquadratic) in a set of variables ifeach entry of this operator (written in an arbitrary basis) istrigono-multilinear (or trigono-multiquadratic) in the same variables.Now Eqs. (22), (23) and (24) imply that Q(θ; {right arrow over (x)}) isa trigono-multilinear operator of {right arrow over (x)}. Then itfollows from Eq. (26) that Δ(θ; {right arrow over (x)}) is atrigono-multiquadratic function of {right arrow over (x)}. Furthermore,it is disclosed that the CSBD coefficient functions of Δ(θ; {right arrowover (x)}) with respect to any x_(j) may be evaluated in O(L) time, andthis greatly facilitates the construction of the algorithms in Section4.1 for tuning the circuit angles {right arrow over (x)}=(x₁, x₂, . . ., x_(2L−1), x_(2L)).3.2. Bayesian Inference with Engineered Likelihood Functions

With the model of (noisy) engineered likelihood functions in place,embodiments of the present invention will be described for tuning thecircuit parameters {right arrow over (x)} and performing Bayesianinference with the resultant likelihood functions for amplitudeestimation.

Let us begin with a high-level overview of embodiments of algorithm forestimating Π=cos(θ)=

A|P|A

. For convenience, such embodiments may work with θ=arccos(Π) ratherthan with Π. Embodiments of the present invention may use a Gaussiandistribution to represent knowledge of θ and make this distributiongradually converge to the true value of 0 as the inference processproceeds. Embodiments of the present invention may start with an initialdistribution of Π(which can be generated by standard sampling or domainknowledge) and convert it to the initial distribution of θ. Thenembodiments of the present invention may iterate the following procedureuntil a convergence criterion is satisfied. At each round, embodimentsof the present invention may find the circuit parameters {right arrowover (x)} that maximize the information gain from the measurementoutcome d in certain sense (based on current knowledge of θ). Then thequantum circuit in FIG. 7 is executed with the optimized parameters{right arrow over (x)} and a measurement outcome d∈{0, 1} is received.Finally, embodiments of the present invention may update thedistribution of θ by using Bayes rule, conditioned on d. Once this loopis finished, embodiments of the present invention may convert the finaldistribution of θ to the final distribution of Π, and set the mean ofthis distribution as the final estimate of Π. See FIG. 8 for aconceptual diagram of this algorithm.

The following describes each component of the above algorithm in moredetail. Throughout the inference process, embodiments of the presentinvention use a Gaussian distribution to keep track of a belief of thevalue of θ. Namely, at each round, θ has prior distribution

$\begin{matrix}{{p(\theta)} = {{{p\left( {{\theta;\mu},\sigma} \right)}\text{:}} = {\frac{1}{\sqrt{2\pi}\sigma}e^{- \frac{{({\theta - \mu})}^{2}}{2\sigma^{2}}}}}} & (32)\end{matrix}$for some prior mean μ∈

and prior variance σ²∈

⁺. After receiving the measurement outcome d, embodiments of the presentinvention may compute the posterior distribution of θ by using Bayesrule

$\begin{matrix}{{{p\left( {{{\theta ❘d};f},\overset{\rightarrow}{x}} \right)} =},} & (33)\end{matrix}$where the normalization factor, or model evidence, is defined as

(d; ƒ, {right arrow over (x)})=∫dθ

(d|θ; ƒ, {right arrow over (x)})p(θ) (recall that ƒ is the fidelity ofthe process for generating the ELF). Although the true posteriordistribution will not be a Gaussian, embodiments of the presentinvention may approximate it as such. Following previous methodology,embodiments of the present invention may replace the true posterior witha Gaussian distribution of the same mean and variance⁴, and set it asthe prior of θ for the next round. Embodiments of the present inventionmay repeat this measurement-and-Bayesian-update procedure until thedistribution of θ is sufficiently concentrated around a single value.⁴Although embodiments of the present invention may compute the mean andvariance of the posterior distribution p(θ|d; ƒ, {right arrow over (x)})directly by definition, this approach is time-consuming, as it involvesnumerical integration. Instead, embodiments of the present invention mayaccelerate this process by taking advantage of certain property ofengineered likelihood functions. See Section 4.2 for more details.

Since the algorithm mainly works with θ and we are eventually interestedin FT embodiments of the present invention may make conversions betweenthe estimators of θ and Π. This is done as follows. Suppose that atround t the prior distribution of θ is

(μ_(t)σ_(t) ²) and the prior distribution of Π is

({circumflex over (μ)}_(t), {circumflex over (σ)}_(t) ²) (note thatμ_(t), σ_(t), {circumflex over (μ)}_(t) and {circumflex over (σ)} arerandom variables as they depend on the history of random measurementoutcomes up to time t). The estimators of θ and Π at this round areμ_(t) and {circumflex over (μ)}_(t), respectively. Given thedistribution

(μ_(t), σ_(t) ²) of θ, embodiments of the present invention may computethe mean {circumflex over (μ)}_(t) and variance {circumflex over(σ)}_(t) ² of cos(θ), and set

({circumflex over (μ)}_(t), {circumflex over (σ)}_(t) ²) on as thedistribution of Π. This step may be done analytically, as if X˜

(μ, σ²), then

$\begin{matrix}{{{{\mathbb{E}}\left\lbrack {\cos\mspace{14mu}(X)} \right\rbrack} = {e^{- \frac{\sigma^{2}}{2}}\mspace{14mu}\cos\mspace{14mu}(\mu)}},} & (34) \\{{{Var}\left\lbrack {\cos\mspace{14mu}(X)} \right\rbrack} = {\frac{1}{2}\left( {1 - e^{- \sigma^{2}}} \right)*{\left( {1 - {e^{- \sigma^{2}}\mspace{14mu}\cos\mspace{14mu}\left( {2\mu} \right)}} \right).}}} & (35)\end{matrix}$

Conversely, given the distribution

({right arrow over (μ)}_(t), {circumflex over (σ)}_(t) ²) of Π,embodiments of the present invention may compute the mean μ_(t) andvariance σ_(t) ² of arccos(Π) (clipping Π to [−1, 1]), and set

(μ_(t), σ_(t) ²) as the distribution of θ. This step may be donenumerically. Even though the cos x or arccos y function of a Gaussianvariable is not truly Gaussian, embodiments of the present invention mayapproximate it as such and find that this has negligible impact on theperformance of the algorithm.

Method for tuning the circuit angles {right arrow over (x)} may beimplemented by embodiments of the present invention as follows. Ideally,they may be chosen carefully so that the mean squared error (MSE) of theestimator μ_(t) of θ decreases as fast as possible as t grows. Inpractice, however, it is hard to compute this quantity directly, andembodiments of the present invention may resort to a proxy of its value.The MSE of an estimator is a sum of the variance of the estimator andthe squared bias of the estimator. The squared bias of μ_(t) may besmaller than its variance, i.e. |

[μ_(t)]−θ*|²<Var (μ_(t)), where θ* is the true value of θ. The varianceσ_(t) ² of θ is often close to the variance of μ_(t), i.e. σ_(t)²≈Var(μ_(t)) with high probability. Combining these facts, we know thatMSE (μ_(t))≤2σ_(t) ² with high probability. So embodiments of thepresent invention may find the parameters {right arrow over (x)} thatminimize the variance σ_(t) ² of θ instead.

Specifically, suppose θ has prior distribution

(μ, σ²). Upon receiving the measurement outcome d∈{0, 1}, the expectedposterior variance of θ is

$\begin{matrix}{{{{\mathbb{E}}_{d}\left\lbrack {{Var}\mspace{14mu}\left( {{{\theta ❘d};f},\overset{\rightarrow}{x}} \right)} \right\rbrack} = {\sigma^{2}\left( {1 - {\sigma^{2}\frac{{f^{2}\left( {\partial_{\mu}{b\left( {\mu,{\sigma;\overset{\rightarrow}{x}}} \right)}} \right)}^{2}}{1 - {f^{2}\left( {b\left( {\mu,{\sigma;\overset{\rightarrow}{x}}} \right)} \right)}^{2}}}} \right)}},{where}} & (36) \\{{b\left( {\mu,{\sigma;\overset{\rightarrow}{x}}} \right)} = {\int_{- \infty}^{\infty}{d\mspace{14mu}\theta\mspace{14mu}{p\left( {{\theta;\mu},\sigma} \right)}{\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)}}}} & (37)\end{matrix}$in which Δ(θ; {right arrow over (x)}) is the bias of the ideallikelihood function as defined in Eq. (26), and ƒ is the fidelity of theprocess for generating the likelihood function. A quantity forengineering likelihood functions is now introduced and referred toherein as the variance reduction factor,

$\begin{matrix}{{{\mathcal{V}\left( {\mu,{\sigma;f},\overset{\rightarrow}{x}} \right)}\text{:}} = {\frac{{f^{2}\left( {\partial_{\mu}{b\left( {\mu,{\sigma;\overset{\rightarrow}{x}}} \right)}} \right)}^{2}}{1 - {f^{2}\left( {b\left( {\mu,{\sigma;\overset{\rightarrow}{x}}} \right)} \right)}^{2}}.}} & (38)\end{matrix}$Then we have

_(d)[Var(θ|d;ƒ,{right arrow over (x)})]=σ²[1−σ² V(μ,σ;ƒ,{right arrowover (x)})].  (39)

The larger

is, the faster the variance of θ decreases on average. Furthermore, toquantify the growth rate (per time step) of the inverse variance of θ,the following quantity may be used

$\begin{matrix}{{{R\left( {\mu,{\sigma;f},\overset{\rightarrow}{x}} \right)}\text{:}} = {\frac{1}{T(L)}\left( {\frac{1}{{\mathbb{E}}_{d}\left\lbrack {{Var}\mspace{14mu}\left( {{{\theta ❘d};f},\overset{\rightarrow}{x}} \right)} \right\rbrack} - \frac{1}{\sigma^{2}}} \right)}} & (40) \\{{= {\frac{1}{T(L)}\frac{\mathcal{V}\left( {\mu,{\sigma;f},\overset{\rightarrow}{x}} \right)}{1 - {\sigma^{2}{\mathcal{V}\left( {\mu,{\sigma;f},\overset{\rightarrow}{x}} \right)}}}}},} & (41)\end{matrix}$where T(L) is the time cost of an inference round. Note that R is amonotonic function of

for

∈(0, 1). Therefore, when the number L of circuit layers is fixed,embodiments of the present invention may maximize R (with respect to{right arrow over (x)}) by maximizing

. In addition, when σ is small, R is approximately proportional to

, i.e. R≈

/T(L). The remainder of this disclosure will assume that the ansatzcircuit contributes most significantly to the duration of the overallcircuit. We take T(L) to be proportional to the number of times theansatz is invoked in the circuit, setting T(L)=2L+1, where time is inunits of ansatz duration.

Now techniques will be disclosed for finding the parameters {right arrowover (x)}=(x₁, x₂, . . . , x_(2L))∈

^(2L) that maximize the variance reduction factor

(μ, σ; ƒ, {right arrow over (x)}) for given μ∈

, σ∈

⁺ and ƒ∈[0, 1]. This optimization problem turns out to be difficult tosolve in general. Fortunately, in practice, embodiments of the presentinvention may assume that the prior variance σ² of θ is small (e.g. atmost 0.01), and in this case,

(μ, σ; ƒ, {right arrow over (x)}) may be approximated by the Fisherinformation of the likelihood function

(d|θ; ƒ, {right arrow over (x)}) at θ=μ, i.e.

$\begin{matrix}{{{\mathcal{V}\left( {\mu,{\sigma;f},\overset{\rightarrow}{x}} \right)} \approx {\left( {{\mu;f},\overset{\rightarrow}{x}} \right)}},{{when}\mspace{14mu}\sigma\mspace{14mu}{is}\mspace{14mu}{small}},{where}} & (42) \\{{\left( {{\theta;f},\overset{\rightarrow}{x}} \right)} = {{\mathbb{E}}_{d}\left\lbrack \left( {{\partial_{\theta}\mspace{14mu}\log}\mspace{14mu}\mspace{14mu}\left( {{{d❘\theta};f},\overset{\rightarrow}{x}} \right)} \right)^{2} \right\rbrack}} & (43) \\{= \frac{{f^{2}\left( {\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)}^{2}}{1 - {f^{2}\left( {\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)}^{2}}} & (44)\end{matrix}$is the Fisher information of the two-outcome likelihood function

(d|θ; ƒ, {right arrow over (x)}) as defined in Eq. (29). Therefore,rather than directly optimizing the variance reduction factor

(μ, σ; ƒ, {right arrow over (x)}), embodiments of the present inventionmay optimize the Fisher information

(μ; ƒ, {right arrow over (x)}), which may be done efficiently byembodiments of the present invention using the algorithms in Section4.1.1. Furthermore, when the fidelity ƒ of the process for generatingthe ELF is low, we have

(θ; ƒ, {right arrow over (x)})≈ƒ²(Δ′(θ; {right arrow over (x)}))². Itfollows that

(μ,σ;ƒ,{right arrow over (x)})≈ƒ²(Δ′(μ;{right arrow over (x)}))²,whenboth σ and ƒ are small.  (45)

So in this case, embodiments of the present invention may optimize|Δ′(μ; {right arrow over (x)})|, which is proportional to the slope ofthe likelihood function

(d|θ; ƒ, {right arrow over (x)}) at θ=μ, and this task may beaccomplished efficiently by embodiments of the present invention usingthe algorithms in Section 4.1.2.

Finally, embodiments of the present invention may make a prediction onhow fast the MSE of the estimator {right arrow over (μ)}_(t) of Π as tgrows. Suppose that the number L of circuit layers is fixed during theinference process. This gives

${{MSE}\mspace{14mu}\left( {\hat{\mu}}_{t} \right)} = \left. {{\Theta\left( \frac{1}{t} \right)}{as}\mspace{14mu} t}\rightarrow{\infty.} \right.$The growth rate of the inverse MSE of {circumflex over (β)}_(t) may bepredicted as follows. As t→∞, we have μ_(t)→θ*, σ_(t)→0, {right arrowover (μ)}_(t)→Π*, and {circumflex over (σ)}_(t)→0 with high probability,where θ* and Π* are the true values of θ and Π, respectively. When thisevent happens, we have that for large t,

$\begin{matrix}{{\frac{1}{\sigma_{t + 1}^{2}} - \frac{1}{\sigma_{t}^{2}}} \approx {\left( {{\mu_{t};f},{\overset{\rightarrow}{x}}_{t}} \right).}} & (46)\end{matrix}$

Consequently, by Eq. (35), we know that for large t,

$\begin{matrix}{{{\frac{1}{{\hat{\sigma}}_{t + 1}^{2}} - \frac{1}{{\hat{\sigma}}_{t}^{2}}} \approx \frac{\left( {{\mu_{t};f},{\overset{\rightarrow}{x}}_{t}} \right)}{\sin\limits^{2}\left( \mu_{t} \right)}},} & (47)\end{matrix}$where μ_(t)≈arccos({circumflex over (μ)}_(t)). Since the bias of{circumflex over (μ)}_(t) is often much smaller than its standarddeviation, and the latter can be approximated by {circumflex over(σ)}_(t), we predict that for large t,

$\begin{matrix}{{{MSE}\mspace{14mu}\left( {\hat{\mu}}_{t} \right)} \approx {\frac{1 - {\hat{\mu}}_{t}^{2}}{t\left( {{{\arccos\mspace{14mu}\left( {\hat{\mu}}_{t} \right)};f},{\overset{\rightarrow}{x}}_{t}} \right)}.}} & (48)\end{matrix}$

This means that the asymptotic growth rate (per time step) of theinverse MSE of {circumflex over (μ)}_(t) should be roughly

$\begin{matrix}{{{{{\hat{R}}_{0}\left( {{\Pi^{*};f},\overset{\rightarrow}{x}} \right)}\text{:}} = \frac{\left( {{{\arccos\mspace{14mu}\left( \Pi^{*} \right)};f},\overset{\rightarrow}{x}} \right)}{{T(L)}\left( {1 - \left( \Pi^{*} \right)^{2}} \right)}},} & (49)\end{matrix}$where {right arrow over (x)} is optimized with respect to μ*=arccos(Π*). This rate will be compared with the empirical growth rate of theinverse MSE of {circumflex over (μ)}_(t) in Section 5.4. Efficient Heuristic Algorithms for Circuit Parameter Tuning andBayesian Inference

This section describes embodiments of heuristic algorithms for tuningthe parameters {right arrow over (x)} of the circuit in FIG. 7 anddescribes how embodiments of the present invention may efficiently carryout Bayesian inference with the resultant likelihood functions.

4.1. Efficient Maximization of Proxies of the Variance Reduction Factor

Algorithms, implemented according to embodiments of the presentinvention, for tuning the circuit angles {right arrow over (x)} arebased on maximizing two proxies of the variance reduction factor

the Fisher information and slope of the likelihood function

(d|θ; ƒ, {right arrow over (x)}). All of these algorithms requireefficient procedures for evaluating the CSBD coefficient functions ofthe bias Δ(θ; {right arrow over (x)}) and its derivative Δ′(θ; {rightarrow over (x)}) with respect to x_(j) for j=1, 2, . . . , 2L. Recallthat we have shown in Section 3.1 that the bias Δ(θ; {right arrow over(x)}) is trigono-multiquadratic in ěcx. Namely, for any j∈{1, 2, . . . ,2L}, there exist functions C_(j)(θ; {right arrow over (x)}_(¬j)),S_(j)(θ; {right arrow over (x)}_(¬j)) and B_(j)(θ; {right arrow over(x)}_(¬j)) of {right arrow over (x)}_(¬j):=(x₁, . . . , x_(j−1),x_(j+1), . . . , x_(2L)) such thatΔ(θ;{right arrow over (x)})=C _(j)(θ;{right arrow over (x)} _(¬j))cos(2x_(j))+S _(j)(θ;{right arrow over (x)} _(¬j))sin(2x _(j))+B _(j)(θ;{rightarrow over (x)} _(¬j)).  (50)It follows thatΔ′(θ;{right arrow over (x)})=C _(j)′(θ;{right arrow over (x)}_(¬j))cos(2x _(j))+S _(j)′(θ;x _(¬j))sin(2x _(j))+B′ _(j)(θ;{right arrowover (x)} _(¬j))  (51)is also trigono-multiquadratic in {right arrow over (x)}, whereC′_(j)(θ; {right arrow over (x)}_(¬j))=∂_(θ)C_(j)(θ; {right arrow over(x)}_(¬j)), S′_(j)(θ; {right arrow over (x)}_(¬j))=∂_(θ)S_(j)(θ; {rightarrow over (x)}_(¬j)), B′_(j)(θ; {right arrow over(x)}_(¬j))=∂_(θ)B_(j)(θ; {right arrow over (x)}_(¬j)) are thederivatives of C_(j)(θ; {right arrow over (x)}_(¬j)), S_(j)(θ; {rightarrow over (x)}_(¬j)), B_(j)(θ; {right arrow over (x)}_(¬j)) withrespect to θ, respectively. It turns out that given θ and {right arrowover (x)}_(¬j), C_(j)(θ; {right arrow over (x)}_(¬j)), each of S_(j)(θ;{right arrow over (x)}_(¬j)), B_(j)(θ; {right arrow over (x)}_(¬j)),C′_(j)(θ; {right arrow over (x)}_(¬j)), S′_(j)(θ; {right arrow over(x)}_(¬j)) and B′_(j)(θ; {right arrow over (x)}_(¬j)) can be computed inO(L) time.

Lemma 1. Given θ and {right arrow over (x)}_(¬j), each of C_(j)(θ;{right arrow over (x)}_(¬j)), S_(j)(θ; {right arrow over (x)}_(¬j)),B_(j)(θ; {right arrow over (x)}_(¬j)), C′_(j)(θ; {right arrow over(x)}_(¬j)), S′_(j)(θ; {right arrow over (x)}_(¬j)) and B′_(j)(θ; {rightarrow over (x)}_(¬j)) can be computed in O(L) time.

Proof. See Appendix C.

4.1.1. Maximizing the Fisher Information of the Likelihood Function

Embodiments of the present invention may execute one or more of twoalgorithms for maximizing the Fisher information of the likelihoodfunction

(d|θ; ƒ, {right arrow over (x)}) at a given point θ=μ(i.e. the priormean of θ). Assume that a goal is to find {right arrow over (x)}∈

^(2L) that maximize

$\begin{matrix}{\left( {{\mu;f},\overset{\rightarrow}{x}} \right) = {\frac{{f^{2}\left( {\Delta^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)} \right)}^{2}}{1 - {f^{2}{\Delta\left( {\mu;\overset{\rightarrow}{x}} \right)}^{2}}}.}} & (52)\end{matrix}$

The first algorithm is based on gradient ascent. Namely, it starts witha random initial point, and keeps taking steps proportional to thegradient of

at the current point, until a convergence criterion is satisfied.Specifically, let {right arrow over (x)}^((t)) be the parameter vectorat iteration t. Embodiments of the present invention may update it asfollows:{right arrow over (x)} ^((t+1)) ={right arrow over (x)} ^((t))+δ(t)∇

(μ;ƒ,{right arrow over(x)})|_({right arrow over (x)}={right arrow over (x)}) _((t))   (53)where δ:

^(≥0)→

⁺ is the step size schedule⁵. This requires the calculation of thepartial derivative of

(μ; ƒ, {right arrow over (x)}) with respect to each x_(j), which can bedone as follows. Embodiments of the present invention first use theprocedures in Lemma 1 to compute C_(j):=C_(j)(μ; {right arrow over(x)}_(¬j)), S_(j):=S_(j)(μ; {right arrow over (x)}_(¬j)),B_(j)::=B_(j)(μ; {right arrow over (x)}_(¬j)), C′_(j):C′_(j)(μ; {rightarrow over (x)}_(¬j)), S′_(j):=S′_(j)(μ; {right arrow over (x)}_(¬j)),and B′_(j):=B′_(j)(μ; {right arrow over (x)}_(¬j)) for each j. Thisobtains

$\begin{matrix}{{{\Delta:} = {{\Delta\left( {\mu;\overset{\rightarrow}{x}} \right)} = {{C_{j}\cos\left( {2x_{j}} \right)} + {S_{j}\sin\left( {2x_{j}} \right)} + B_{j}}}},} & (54)\end{matrix}$ $\begin{matrix}{{{\Delta^{\prime}:} = {{\Delta^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)} = {{C_{j}^{\prime}\cos\left( {2x_{j}} \right)} + {S_{j}^{\prime}\sin\left( {2x_{j}} \right)} + B_{j}^{\prime}}}},} & (55)\end{matrix}$ $\begin{matrix}{{{\chi_{j}:} = {\frac{\partial{\Delta\left( {\mu;\overset{\rightarrow}{x}} \right)}}{\partial x_{j}} = {2\left( {{{- C_{j}}\sin\left( {2x_{j}} \right)} + {S_{j}\cos\left( {2x_{j}} \right)}} \right)}}},} & (56)\end{matrix}$ $\begin{matrix}{{{\chi_{j}^{\prime}:} = {\frac{\partial{\Delta^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)}}{\partial x_{j}} = {2\left( {{{- C_{j}^{\prime}}\sin\left( {2x_{j}} \right)} + {S_{j}^{\prime}\cos\left( {2x_{j}} \right)}} \right)}}};} & (57)\end{matrix}$

Knowing these quantities, embodiments of the present invention maycompute the partial derivative of

(μ; ƒ, {right arrow over (x)}) with respect to x_(j) as follows:

$\begin{matrix}{{\gamma_{j}:} = {\frac{\partial\left( {{\mu;f},\overset{\rightarrow}{x}} \right)}{\partial x_{j}} = {\frac{2{f^{2}\left\lbrack {{\left( {1 - {f^{2}\Delta^{2}}} \right)\Delta^{\prime}\chi_{j}^{\prime}} + {f^{2}{{\Delta\chi}_{j}\left( \Delta^{\prime} \right)}^{2}}} \right\rbrack}}{\left\lbrack {1 - {f^{2}\Delta^{2}}} \right\rbrack^{2}}.}}} & (58)\end{matrix}$

Embodiments of the present invention may repeat this procedure for j=1,2, . . . , 2L. Then embodiments of the present invention may obtain ∇

(μ; ƒ, {right arrow over (x)})=(γ₁, γ₂, . . . , γ_(2L)). Each iterationof the algorithm takes O(L²) time. The number of iterations in thealgorithm depends on the initial point, the termination criterion andthe step size schedule δ. See Algorithm 65 for more details. ⁵In thesimplest case, δ(t)=δ is constant. But in order to achieve betterperformance, we might want δ(t)→0 as t→∞.

The second algorithm is based on coordinate ascent. Unlike gradientascent, this algorithm does not require step sizes, and allows eachvariable to change dramatically in a single step. As a consequence, itmay converge faster than the previous algorithm. Specifically,embodiments of the present invention which implement this algorithm maystart with a random initial point, and successively maximize theobjective function

(μ; ƒ, {right arrow over (x)}) along coordinate directions, until aconvergence criterion is satisfied. At the j-th step of each round, itsolves the following single-variable optimization problem for acoordinate x_(j):

$\begin{matrix}{{\underset{z}{\arg\max}\frac{{f^{2}\left( {{C_{j}^{\prime}\cos\left( {2z} \right)} + {S_{j}^{\prime}\sin\left( {2z} \right)} + B_{j}^{\prime}} \right)}^{2}}{1 - {f^{2}\left( {{C_{j}\cos\left( {2z} \right)} + {S_{j}\sin\left( {2z} \right)} + B_{j}} \right)}^{2}}},} & (59)\end{matrix}$where C_(j)=C_(j)(μ; {right arrow over (x)}_(¬j)), S_(j)=S_(j)(μ; {rightarrow over (x)}_(¬j)), B_(j)=B_(j)(μ; {right arrow over (x)}_(¬j)),C′_(j)=C′_(j)(μ; {right arrow over (x)}_(¬j)), S′_(j)=S′_(j)(μ; {rightarrow over (x)}_(¬j)), B′_(j)=B′_(j)(μ; {right arrow over (x)}_(¬j)) maybe computed in O(L) time by the procedures in Lemma 1. Thissingle-variable optimization problem may be tackled by standardgradient-based methods, and we set x_(j) to be its solution. Repeat thisprocedure for j=1, 2, . . . , 2L. This algorithm produces a sequence{right arrow over (x)}⁽⁰⁾, {right arrow over (x)}⁽¹⁾, {right arrow over(x)}⁽²⁾, . . . , such that

(μ; ƒ, {right arrow over (x)}⁰)≤

(μ; ƒ, {right arrow over (x)}¹)≤

(μ; ƒ, {right arrow over (x)}²)≤ . . . . Namely, the value of

(μ; ƒ, {right arrow over (x)}^(t)) increases monotonically as t grows.Each round of the algorithm takes θ(L²) time. The number of rounds inthe algorithm depends on the initial point and the terminationcriterion.4.1.2. Maximizing the Slope of the Likelihood Function

Embodiments of the present invention may perform one or more of twoalgorithms for maximizing the slope of the likelihood function

(d|θ; ƒ, {right arrow over (x)}) at a given point θ=μ(i.e. the priormean of θ). Assume that a goal is to find {right arrow over (x)}∈

^(2L) that maximize |

′(μ; ƒ, {right arrow over (x)})|=ƒ|Δ′(μ; {right arrow over (x)})|/2.

Similar to Algorithms 65 and 65 for Fisher information maximization, thealgorithms for slope maximization are also based on gradient ascent andcoordinate ascent, respectively. They both call the procedures in Lemma1 to evaluate C′(μ; {right arrow over (x)}_(¬j)), S′(μ; {right arrowover (x)}_(¬j)) and B′(μ; {right arrow over (x)}_(¬j)) for given μ and{right arrow over (x)}_(¬j). However, the gradient-ascent-basedalgorithm uses the above quantities to compute the partial derivative of(Δ′(μ; {right arrow over (x)}))² with respect to x_(j), while thecoordinate-ascent-based algorithm uses them to directly update the valueof x_(j). These algorithms are formally described in Algorithms 1 and 2,respectively.

4.2. Approximate Bayesian Inference with Engineered Likelihood Functions

With the algorithms for tuning the circuit parameters in place, we nowdescribe how to efficiently carry out Bayesian inference with theresultant likelihood functions. Embodiments of the present invention maycompute the posterior mean and variance of θ directly after receiving ameasurement outcome d. But this approach is time-consuming, as itinvolves numerical integration. By taking advantage of certain propertyof the engineered likelihood functions, embodiments of the presentinvention may greatly accelerate this process.

Suppose θ has prior distribution

(μ, σ²), where σ«1/L, and the fidelity of the process for generating theELF is ƒ. Embodiments of the present invention may find that theparameters {right arrow over (x)}=(x₁, x₂, . . . , x_(2L)) that maximize

(μ; ƒ, {right arrow over (x)}) (or |Δ′(μ; ƒ, {right arrow over (x)})|)satisfy the following property: When θ is close to μ, i.e. θ∈[μ−O(σ),μ+O(σ)], we have

$\begin{matrix}{\left( {{{d❘\theta};f},\overset{\rightarrow}{x}} \right) \approx \frac{1 + {\left( {- 1} \right)^{d}f\sin\left( {{r\theta} + b} \right)}}{2}} & (67)\end{matrix}$for some r, b∈

. Namely, embodiments of the present invention may approximate Δ(θ;{right arrow over (x)}) by a sinusoidal function in this region of θ.FIG. 13 illustrates one such example.

Embodiments of the present invention may find the best-fitting r and bby solving the following least squares problem:

$\begin{matrix}{{\left( {r^{*},b^{*}} \right) = {\underset{r,b}{\arg\min}{\sum\limits_{\theta \in \Theta}{❘{{\arcsin\left( {\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)} - {r\theta} - b}❘}^{2}}}},} & (68)\end{matrix}$where Θ={θ₁, θ₂, . . . , θ_(k)}⊆[μ−O(σ), μ+O(σ)]. This least-squaresproblem has the following analytical solution:

$\begin{matrix}{{\begin{pmatrix}r^{*} \\b^{*}\end{pmatrix} = {{A^{+}z} = {\left( {A^{T}A} \right)^{- 1}A^{T}z}}},{where}} & (69)\end{matrix}$ $\begin{matrix}{{A = \begin{pmatrix}\theta_{1} & 1 \\\theta_{2} & 1 \\ \vdots & \vdots \\\theta_{k} & 1\end{pmatrix}},{z = {\begin{pmatrix}{\arcsin\left( {\Delta\left( {\theta_{1};\overset{\rightarrow}{x}} \right)} \right)} \\{\arcsin\left( {\Delta\left( {\theta_{2};\overset{\rightarrow}{x}} \right)} \right)} \\ \vdots \\{\arcsin\left( {\Delta\left( {\theta_{k};\overset{\rightarrow}{x}} \right)} \right)}\end{pmatrix}.}}} & (76)\end{matrix}$

FIG. 13 demonstrates an example of the true and fitted likelihoodfunctions. Once embodiments of the present invention obtain the optimumr and b, they may approximate the posterior mean and variance of θ bythe ones for

$\begin{matrix}{{\left( {{d❘\theta};f} \right) = \frac{1 + {\left( {- 1} \right)^{d}f\sin\left( {{r\theta} + b} \right)}}{2}},} & (77)\end{matrix}$which have analytical formulas. Specifically, suppose θ has priordistribution

(μ_(k), σ_(k) ²) at round k. Let d_(k) be the measurement outcome and(r_(k), b_(k)) be the best-fitting parameters at this round. Thenembodiments of the present invention may approximate the posterior meanand variance of θ by

$\begin{matrix}{{\mu_{k + 1} = {\mu_{k} + \frac{\left( {- 1} \right)^{d_{k}}{fe}^{{- r_{k}^{2}}\sigma_{k}^{2}/2}r_{k}\sigma_{k}^{2}\cos\left( {{r_{k}\mu_{k}} + b_{k}} \right)}{1 + {\left( {- 1} \right)^{d_{k}}{fe}^{{- r_{k}^{2}}\sigma_{k}^{2}/2}\sin\left( {{r_{k}\mu_{k}} + b_{k}} \right)}}}},} & (78)\end{matrix}$ $\begin{matrix}{\sigma_{k + 1}^{2} = {{\sigma_{k}^{2}\left( {1 - \frac{{fr}_{k}^{2}\sigma_{k}^{2}{e^{{- r_{k}^{2}}\sigma_{k}^{2}/2}\left\lbrack {{fe}^{{- r_{k}^{2}}\sigma_{k}^{2}/2} + {\left( {- 1} \right)^{d_{k}}\sin\left( {{r_{k}\mu_{k}} + b_{k}} \right)}} \right\rbrack}}{\left\lbrack {1 + {\left( {- 1} \right)^{d_{k}}{fe}^{{- r_{k}^{2}}\sigma_{k}^{2}/2}\sin\left( {{r_{k}\mu_{k}} + b_{k}} \right)}} \right\rbrack^{2}}} \right)}.}} & (79)\end{matrix}$

After that, embodiments of the present invention may proceed to the nextround, setting

(μ_(k+1), σ_(k+1) ²) as the prior distribution of θ for that round.

Note that, as FIG. 13 illustrates, the difference between the true andfitted likelihood functions can be large when θ is far from μ, i.e.|θ−μ|»σ. But since the prior distribution

${P(\theta)} = {\frac{1}{\sqrt{2\pi}\sigma}e^{- \frac{{({\theta - \mu})}^{2}}{2\sigma^{2}}}}$decays exponentially in |θ−μ|, such θ's have little contribution to thecomputation of posterior mean and variance of θ. So Eqs. (78) and (79)give highly accurate estimates of the posterior mean and variance of θ,and their errors have negligible impact on the performance of the wholealgorithm.5. Simulation Results

This section describes certain results of simulating Bayesian inferencewith engineered likelihood functions for amplitude estimation. Theseresults demonstrate certain advantages of certain engineered likelihoodfunctions over unengineered ones, as well as the impacts of circuitdepth and fidelity on their performance.

5.1. Experimental Details

In our experiments, we assume that it takes much less time to implementU(x)=exp (−ixP) and perform the projective measurement

$\left\{ {\frac{I + P}{2},\frac{I - P}{2}} \right\}$than to implement A. So when the number of circuit layer is L, the timecost of an inference round is roughly (2L+1)T(A), where T(A) is the timecost of A (note that an L-layer circuit makes 2L+1 uses of A and A^(†)).For simplicity, assume that A takes unit time (i.e. T(A)=1) in theupcoming discussion. Moreover, assume that there is no error in thepreparation and measurements of quantum states, i.e. p=1, in theexperiments.

Suppose we aim to estimate the expectation value Π=cos (θ)=

A|P|A

. Let {circumflex over (μ)}_(t) be the estimator of Π at time t. Notethat {circumflex over (μ)}_(t) itself is a random variable, since itdepends on the history of random measurement outcomes up to time t. Wemeasure the performance of a scheme by the root-mean-squared error(RMSE) of μ_(t), which is given byRMSE_(t)=MSE_(t)=

  (80)

The following will describe how fast RMSE_(t) decays as t grows forvarious schemes, including the ancilla-based Chebyshev likelihoodfunction (AB CLF), ancilla-based engineered likelihood function (ABELF), ancilla-free Chebyshev likelihood function (AF CLF), andancilla-free engineered likelihood function (AF ELF).

In general, the distribution of μ_(t) is difficult to characterize, andthere is no analytical formula for RMSE_(t). To estimate this quantity,embodiments of the present invention may execute the inference process Mtimes, and collect M samples {circumflex over (μ)}_(t) ⁽¹⁾, {circumflexover (μ)}_(t) ⁽²⁾, . . . , {circumflex over (μ)}_(t) ^((M)) of{circumflex over (μ)}_(t), where {circumflex over (μ)}^((i)) be theestimate of Π at time tin the i-th run, for i=1, 2, . . . , M. Thenembodiments of the present invention may use the quantity

$\begin{matrix}{{\overset{\_}{{RMSE}_{t}}:} = {\sqrt{\frac{1}{M}{\sum\limits_{i = 1}^{M}\left( {{\hat{\mu}}_{t}^{(i)} - \Pi} \right)^{2}}}.}} & (81)\end{matrix}$to approximate the true RMSE_(t). In our experiments, we set M=300 andfind that this leads to satisfactory results.

Embodiments of the present invention may use coordinate-ascent-basedAlgorithms 2 and 6 to optimize the circuit parameters {right arrow over(x)} in the ancilla-free and ancilla-based cases, respectively. Thisshows that Algorithms 1 and 2 produce solutions of equal quality, andthe same statement holds for Algorithms 5 and 6. So our experimentalresults would not change if we had used gradient-ascent-based Algorithms1 and 5 to tune the circuit angles {right arrow over (x)} instead.

For Bayesian update with ELFs, embodiments of the present invention mayuse the methods in Section 4.2 and Appendix A.2 to compute the posteriormean and variance of θ in the ancilla-free and ancilla-based cases,respectively. In particular, during the sinusoidal fitting of ELFs,embodiments of the present invention may set Θ=σ, μ−0.8σ, . . . ,μ+0.8σ+σ(i.e. Θ contains 11 uniformly distributed points in [μ−σ, μ+σ])in Eqs. (68) and (148). We find that this is sufficient for obtaininghigh-quality sinusoidal fits of ELFs.

6. A Model for Noisy Algorithm Performance

Embodiments of the present invention may implement a model for theruntime needed to achieve a target mean-squared error in the estimate ofΠ as it is scaled to larger systems and run on devices with better gatefidelities. This model may be built on two main assumptions. The firstis that the growth rate of the inverse mean squared error iswell-described by half the inverse variance rate expression (c.f. Eq.(40)). The half is due to the conservative estimate that the varianceand squared bias contribute equally to the mean squared error(simulations from the previous section show that the squared bias tendsto be less than the variance). The second assumption is an empiricallower bound on the variance reduction factor, which is motivated bynumerical investigations of the Chebyshev likelihood function.

We carry out analysis for the MSE with respect to the estimate of θ. Wewill then convert the MSE of this estimate to an estimate of MSE withrespect to Π. Our strategy will be to integrate upper and lower boundsfor the rate expression R(μ, σ; ƒ, m) in Eq. (40) to arrive at boundsfor inverse MSE as a function of time.

To help our analysis we make the substitution m=T(L)=2L+1 andreparameterize the way noise is incorporated by introducing λ and α suchthat ƒ²=p ²p^(2L)=e^(−λ(2L+1)−α)=e^(−λm−α).

The upper and lower bounds on this rate expression are based on findingsfor the Chebyshev likelihood functions, where {right arrow over(x)}=(π/2)^(2L). Since the Chebyshev likelihood functions are a subsetof the engineered likelihood functions, a lower bound on the Chebyshevperformance gives a lower bound on the ELF performance. We leave as aconjecture that the upper bound for this rate in the case of ELF is asmall multiple (e.g. 1.5) of the upper bound we have established for theChebyshev rate.

The Chebyshev upper bound is established as follows. For fixed σ, λ, andm, one can show⁶ that the variance reduction factor achieves a maximumvalue of

=m² exp (−m²σ²−λm−α), occurring at μ=π/2. This expression is less thanm²e^(−m) ² ^(σ) ² , which achieves a maximum of

${\left( {e\sigma^{2}} \right)^{- 1}{at}m} = {\frac{1}{\sigma}.}$Thus, the factor 1/(1−σ²

) cannot exceed 1/(1−e⁻¹)≈1.582. Putting this all together, for fixed σ,λ, and m, the maximum rate is upper bounded as

${R\left( {\mu,{\sigma;\lambda},\alpha,m} \right)} \leq {\frac{em}{e - 1}{{\exp\left( {{{- m^{2}}\sigma^{2}} - {\lambda m} - \alpha} \right)}.}}$This follows from the fact that R is monotonic in V and that V ismaximized at μ=π/2. In practice, embodiments of the present inventionmay use a value of L which maximizes the inverse variance rate. The rateachieved by discrete L cannot exceed the value obtained when optimizingthe above upper bound over continuous value of m. This optimal value isrealized for 1/m=½(√{square root over (λ²+8σ²)}+λ). We define R(σ; λ, α)by evaluating R(π/2, σ; λ, α, m) at this optimum value,

$\begin{matrix}{{{\overset{\_}{R}\left( {{\sigma;\lambda},\alpha} \right)} = {\frac{2e^{{- \alpha} - 1}}{\sqrt{\lambda^{2}\ 8\sigma^{2}} + \lambda}{\exp\left( \frac{2\sigma^{2}}{{4\sigma^{2}} + \lambda^{2} + {\lambda^{2}\sqrt{{8\sigma^{2}/\lambda^{2}} + 1}}} \right)}}},} & (82)\end{matrix}$which gives the upper bound on the Chebyshev rate

$\begin{matrix}{{R_{C}^{*}\left( {\mu,{\sigma;\lambda},\alpha} \right)} = {{\max\limits_{L}{R\left( {\mu,{\sigma;\lambda},\alpha,m} \right)}} \leq {\frac{e}{e - 1}{{\overset{\_}{R}\left( {{\sigma;\lambda},\alpha} \right)}.}}}} & (83)\end{matrix}$

Embodiments of the present invention do not have an analytic lower boundon the Chebyshev likelihood performance. We can establish an empiricallower bound based on numerical checks. For any fixed L, the inversevariance rate is zero at the 2L+2 points μ∈{0, π/(2L+1), 2π/(2L+1), . .. , 2Lπ/(2L+1), π}. Since the rate is zero at these end points for allL, the global lower bound on R_(C)* is zero. However, we are notconcerned with the poor performance of the inverse variance rate nearthese end points. When we convert the estimator from {circumflex over(θ)} to {circumflex over (Π)}=cos {circumflex over (θ)}, the informationgain near these end point actually tends to a large value. For thepurpose of establishing useful bounds, we will restrict μ to be in therange [0.1π, 0.9π]. In the numerical tests⁷ we find that for allμ∈[0.1π, 0.9π], there is always a choice of L for which the inversevariance rate is above (e−1)²/e²≈0.40 times the upper bound. Puttingthese together, we have

$\begin{matrix}{{\frac{e - 1}{e}{\overset{\_}{R}\left( {{\sigma;\lambda},\alpha} \right)}} \leq {R_{C}^{*}\left( {\mu,{\sigma;\lambda},\alpha} \right)} \leq {\frac{e}{e - 1}{{\overset{\_}{R}\left( {{\sigma;\lambda},\alpha} \right)}.}}} & (84)\end{matrix}$

It is important to note that by letting m be continuous, certain valuesof σ and λ can lead to an optimal m for which L=(m−1)/2 is negative.Therefore, these results apply only in the case that λ≤1, which ensuresthat m≥1. We expect this model to break down in the large-noise regime(i.e. λ≥1). ⁷We searched over a uniform grid of 50000 values of 0, Lvalues from L*/3 to 3L*, where L* is to the optimized value used toarrive at Eq. 82, and σ and λ ranging over [10 ⁻¹, 10⁻², . . . , 10⁻⁵].For each (σ, λ) pair we found the θ for which the maximum inversevariance rate (over L) is a minimum. For all (σ, λ) pairs checked, thisworst-case rate was always between 0.4 and 0.5, with the smallest valuefound being R=0.41700368≥(e−1)²/e².⁶ For the Chebvshev likelihoodfunctions, we can express the variance reduction factor as

${\mathcal{V}\left( {\mu,{\sigma;f},\left( \frac{\pi}{2} \right)^{2L}} \right)} = {m_{L}^{2}/\left( {1 + {\left( {{f^{- 2}e^{m_{L}^{2}\sigma^{2}}} - 1} \right){\csc\limits^{2}\left( {m_{L}\mu} \right)}}} \right)}$whenever sin (m_(L)μ)≠0. Then, csc(m_(L)μ)≥1 implies that

(μ, σ; ƒ,

$\left. \left( \frac{\pi}{2} \right)^{2L} \right) \leq {f^{2}m_{L}^{2}{e^{{- m_{L}^{2}}\sigma^{2}}.}}$

For now, we will assume that the rate tracks the geometric mean of thesetwo bounds, i.e. R_(C)*(σ, λ, μ)=R(σ, λ), keeping in mind that the upperand lower bounds are small constant factors off of this.

Assume that the inverse variance grows continuously in time at a rategiven by the difference quotient expression captured by theinverse-variance rate,

$R^{*} = {\frac{d}{dt}{\frac{1}{\sigma^{2}}.}}$

Letting F=1/σ² denote this inverse variance, the rate equation above canbe recast as a differential equation for F,

$\begin{matrix}{\frac{dF}{dt} = {\frac{2e^{{- \alpha} - 1}}{{\lambda\sqrt{1 + {8/\left( {F\lambda^{2}} \right)}}} + \lambda}{{\exp\left( \frac{2}{4 + {\lambda^{2}F} + {\lambda^{2}F\sqrt{1 + {8/\left( {\lambda^{2}F} \right)}}}} \right)}.}}} & (85)\end{matrix}$

Through this expression, we can identify both the Heisenberg limitbehavior and shot-noise limit behavior. For F«1/λ², the differentialequation becomes

$\begin{matrix}{{\frac{dF}{dt} = {\frac{e^{{- \alpha} - {1/2}}}{\sqrt{2}}\sqrt{F}}},} & (86)\end{matrix}$which integrates to a quadratic growth of the inverse squared errorF(t)˜t². This is the signature of the Heisenberg limit regime. ForF»1/λ², the rate approaches a constant,

$\begin{matrix}{\frac{dF}{dt} = {\frac{e^{{- \alpha} - 1}}{\lambda}.}} & (87)\end{matrix}$

This regime yields a linear growth in the inverse squared error F(t)˜t,indicative of the shot-noise limit regime.

In order to make the integral tractable, the rate expression may bereplaced with integrable upper and lower bound expressions (to be usedin tandem with our previous bounds). Letting x=λ²F, these bounds arere-expressed as,

$\begin{matrix}{\frac{2e^{{- \alpha} - 1}\lambda}{1 + {1/\sqrt{12x}} + {\left( {x + 4} \right)/\sqrt{x^{2} + {8x}}}} \geq \frac{dx}{dt} \geq {\frac{2e^{{- \alpha} - 1}\lambda}{1 + {1/\sqrt{4x}} + {\left( {x + 4} \right)/\sqrt{x^{2} + {8x}}}}.}} & (88)\end{matrix}$

From the upper bound we can establish a lower bound on the runtime, bytreating time as a function of x and integrating,

$\begin{matrix}{{\int_{0}^{t}{dt}} \geq {\int_{x_{0}}^{x_{f}}{{dx}\frac{e^{\alpha + 1}}{2\lambda}\left( {1 + {1/\sqrt{12x}} + {\left( {x + 4} \right)/\sqrt{x^{2} + {8x}}}} \right)}}} & (89)\end{matrix}$ $\begin{matrix}{= {\frac{e^{\alpha + 1}}{2\lambda}{\left( {x_{f} + \sqrt{x_{f}/3} + {\frac{1}{2}\sqrt{x_{f}^{2} + {8x_{f}}}} - x_{0} - \sqrt{x_{0}/3} - {\frac{1}{2}\sqrt{x_{0}^{2} + {8x_{0}}}}} \right).}}} & (90)\end{matrix}$

Similarly, we can use the lower bound to establish an upper bound on theruntime. Here we introduce our assumption that, in the worst case, theMSE of the phase estimate ε_(θ) ² is twice the variance (i.e. thevariance equals the bias), so the variance must reach half the MSE:σ²=ε_(θ) ²/2=λ²/x. In the best case, we assume the bias in the estimateis zero and set ε_(θ) ²=λ²/x. We combine these bounds with the upper andlower bounds of Eq. (84) to arrive at the bounds on the estimationruntime as a function of target MSE,

$\begin{matrix}{{{\left( {e - 1} \right)\frac{e^{- \lambda}}{2{\overset{\_}{p}}^{2}}\left( {\frac{\lambda}{\varepsilon_{\theta}^{2}} + \frac{1}{\sqrt{3}\varepsilon_{\theta}} + \sqrt{\left( \frac{\lambda}{\varepsilon_{\theta}^{2}} \right)^{2} + \left( \frac{2\sqrt{2}}{\varepsilon_{\theta}} \right)^{2}}} \right)} \leq t_{\varepsilon_{\theta}} \leq {\frac{e^{2}}{e - 1}\frac{e^{- \lambda}}{{\overset{\_}{p}}^{2}}\left( {\frac{\lambda}{\varepsilon_{\theta}^{2}} + \frac{1}{\sqrt{2}\varepsilon_{\theta}} + \sqrt{\left( \frac{\lambda}{\varepsilon_{\theta}^{2}} \right)^{2} + \left( \frac{2}{\varepsilon_{beta}} \right)^{2}}} \right)}},} & (91)\end{matrix}$where θ∈[0.1π, 0.9π].

At this point, we can convert our phase estimate e back into theamplitude estimate {circumflex over (Π)}. The MSE with respect to theamplitude estimate ε_(Π) ² can be approximated in terms of the phaseestimate MSE as

$\begin{matrix}{{\varepsilon_{\Pi}^{2} = {{{\mathbb{E}}\left( {\hat{\Pi} - \Pi} \right)}^{2} = {{{{\mathbb{E}}\left( {{\cos\hat{\theta}} - {\cos\theta}} \right)}^{2} \approx {{\mathbb{E}}\left( {\left( {\hat{\theta} - \theta} \right)\frac{d\cos\theta}{d\theta}} \right)}^{2}} = {\varepsilon_{\theta}^{2}\sin\limits^{2}\theta}}}},} & (92)\end{matrix}$where we have assumed that the distribution of the estimator issufficiently peaked about θ to ignore higher-order terms. This leads toε_(θ) ²=ε_(Π) ²/(1−Π²), which can be substituted into the aboveexpressions for the bounds, which hold for Π∈[cos 0.9π, cos 0.1π]≈[−0.95, 0.95]. Dropping the estimator subscripts (as they onlycontribute constant factors), we can establish the runtime scaling inthe low-noise and high-noise limits,

$t_{\varepsilon} = \left\{ \begin{matrix}{O\left( {e^{\alpha}/\varepsilon} \right)} & {{\lambda\operatorname{<<}\varepsilon},(93)} \\{O\left( {e^{\alpha}\lambda/\varepsilon^{2}} \right)} & {{\lambda\operatorname{>>}\varepsilon},(94)}\end{matrix} \right.$observing that the Heisenberg-limit scaling and shot-noise limit scalingare each recovered.

We arrived at these bounds using properties of Chebyshev likelihoodfunctions. As we have shown in the previous section, by engineeringlikelihood functions, in many cases we can reduce estimation runtimes.Motivated by our numerical findings of the variance reduction factors ofengineered likelihood functions (see, e.g. FIG. 19 ), we conjecture thatusing engineered likelihood functions increases the worst caseinverse-variance rate in Eq. (84) to R(σ; λ, α)≤R_(C)(μ, σ; λ, α).

In order to give more meaning to this model, we will refine it to be interms of number of qubits n and two-qubit gate fidelities ƒ_(2Q). Weconsider the task of estimating the expectation value of a Pauli stringP with respect to state |A

. Assume that Π=

A|P|A

is very near zero so that ε²=ε_(Π) ²≈Π_(θ) ². Let the two-qubit gatedepth of each of the L layers be D. We model the total layer fidelity asp=ƒ_(2Q) ^(nD/2), where we have ignored errors due to single-qubitgates. From this, we have Δ=½ nD ln (1/ƒ_(2Q) and α=2 ln (1/p)−½ nDln(1/ƒ_(2Q). Putting these together, we arrive at the runtimeexpression,

$\begin{matrix}{t_{\varepsilon} = {e\frac{f_{2Q}^{{nD}/2}}{2{\overset{\_}{p}}^{2}}{\left( {\frac{{nD}\ln\left( {1/f_{2Q}} \right)}{2\varepsilon^{2}} + \frac{1}{\sqrt{6}\varepsilon} + \sqrt{\left( \frac{{nD}\ln\left( {1/f_{2Q}} \right)}{2\varepsilon^{2}} \right)^{2} + \left( \frac{2\sqrt{2}}{\varepsilon} \right)^{2}}} \right).}}} & (95)\end{matrix}$

Finally, we will put some meaningful numbers in this expression andestimate the required runtime in seconds as a function of two-qubit gatefidelities. To achieve quantum advantage we expect that the probleminstance will require on the order of n=100 logical qubits and that thetwo-qubit gate depth is on the order of the number of qubits, D=200.Furthermore, we expect that target accuracies ε will need to be on theorder of ε=10⁻³ to 10⁻⁵. The runtime model measures time in terms ofansatz circuit durations. To convert this into seconds we assume eachlayer of two-qubit gates will take time G=10⁻⁸ s, which is an optimisticassumption for today's superconducting qubit hardware. FIG. 26 showsthis estimated runtime as a function of two-qubit gate fidelity.

The two-qubit gate fidelities required to reduce runtimes into apractical region will most likely require error correction. Performingquantum error correction requires an overhead which increases theseruntimes. In designing quantum error correction protocols, it isessential that the improvement in gate fidelities is not outweighed bythe increase in estimation runtime. The proposed model gives a means ofquantifying this trade-off: the product of gate infidelity and(error-corrected) gate time should decrease as useful error correctionis incorporated. In practice, there are many subtleties which should beaccounted for to make a more rigorous statement. These includeconsidering the variation in gate fidelities among gates in the circuitand the varying time costs of different types of gates. Nevertheless,the cost analyses afforded by this simple model may be a useful tool inthe design of quantum gates, quantum chips, error correcting schemes,and noise mitigation schemes.

APPENDIX A. ANCILLA-BASED SCHEME

In this appendix, we present an alternative scheme, called theancilla-based scheme. In this scheme, the engineered likelihood function(ELF) is generated by the quantum circuit in FIG. 27 , in which {rightarrow over (x)}=(x₁, x₂, . . . , x_(2L−1), x_(2L))∈

are tunable parameters.

Assuming the circuit in FIG. 27 is noiseless, the engineered likelihoodfunction is given by

(d|θ;{right arrow over (x)})=½[1+(−1)^(d)Λ(θ;{right arrow over(x)})],∀d∈{0,1}  (96)whereΛ(θ;{right arrow over (x)})=Re(

A|Q(θ;{right arrow over (x)})|A

)  (97)is the bias of the likelihood function. It turns out that most of theargument in Section 3.1 still holds in the ancilla-based case, exceptthat Δ(θ; {right arrow over (x)}) is replaced with Λ(θ; {right arrowover (x)}). So we will use the same notation (e.g. |0

, |1

, X, Y, Z, Ī) as before, unless otherwise stated. In particular, when wetake the errors in the circuit in FIG. 27 into account, the noisylikelihood function is given by

(d|θ;ƒ,{right arrow over (x)})=½[1+(−1)^(d)ƒΛ(θ;{right arrow over(x)})],∀d∈{0,1}  (98)where ƒ is the fidelity of the process for generating the ELF. Notethat, however, there does exist a difference between Δ(θ; {right arrowover (x)}) and Λ(θ; {right arrow over (x)}), as the former istrigono-multiqudaratic—in {right arrow over (x)}, while the latter istrigono-multilinear in {right arrow over (x)}.

We will tune the circuit angles {right arrow over (x)} and performBayesian inference with the resultant ELFs in the same way as in Section3.2. In fact, the argument in Section 3.2 still holds in theancilla-based case, except that we need to replace Δ(θ; {right arrowover (x)}) with Λ(θ; {right arrow over (x)}). So we will use the samenotation as before, unless otherwise stated. In particular, we alsodefine the variance reduction factor

(μ, σ; ƒ, x) as in Eqs. (37) and (38), replacing Δ(θ; {right arrow over(x)}) with Λ(θ; {right arrow over (x)}). It can be shown,

$\begin{matrix}{{{{\mathcal{V}\left( {\mu,{\sigma;f},\overset{\rightarrow}{x}} \right)} \approx \left( {{\mu;f},\overset{\rightarrow}{x}} \right)} = \frac{{f^{2}\left( {\Lambda^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)}^{2}}{1 - {f^{2}\left( {\Lambda\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)}^{2}}},{{when}\sigma{is}{small}},{and}} & (99)\end{matrix}$ $\begin{matrix}{{{\mathcal{V}\left( {\mu,{\sigma;f},\overset{\rightarrow}{x}} \right)} \approx {f^{2}\left( {\Lambda^{\prime}\left( {\mu;\overset{\rightarrow}{x}} \right)} \right)}^{2}},{{when}{both}{}\sigma{and}f{are}{{small}.}}} & (100)\end{matrix}$

Namely, the Fisher information and slope of the likelihood function

(d|θ; ƒ, {right arrow over (x)}) at θ=μ are two proxies of the variancereduction factor

(μ, σ; ƒ, {right arrow over (x)}) under reasonable assumptions. Sincethe direct optimization of

is hard in general, we will tune the parameters {right arrow over (x)}by optimizing these proxies instead.

A.1. Efficient Maximization of Proxies of the Variance Reduction Factor

Now we present efficient heuristic algorithms for maximizing two proxiesof the variance reduction factor

—the Fisher information and slope of the likelihood function

(d|θ; ƒ, {right arrow over (x)}). All of these algorithms make use ofthe following procedures for evaluating the CSD coefficient functions ofthe bias Λ(θ; {right arrow over (x)}) and its derivative Λ′(θ; {rightarrow over (x)}) with respect to x_(j) for j=1, 2, . . . , 2L.

A.1.1. Evaluating the CSD Coefficient Functions of the Bias and itsDerivative

Since Λ(θ; {right arrow over (x)}) is trigono-multilinear in {rightarrow over (x)}, for any j∈{1, 2, . . . , 2L}, there exist functionsC_(j)(θ; {right arrow over (x)}_(¬j)) and S_(j)(θ; {right arrow over(x)}_(¬j)) which are trigono-multilinear in {right arrow over(x)}_(¬j)=(x₁, . . . , x_(j−1), x_(j+1), x_(2L)), such thatΛ(θ;{right arrow over (x)})=C _(j)(θ;{right arrow over (x)} _(¬j))cos(x_(j))+S _(j)(θ;{right arrow over (x)} _(¬j))sin(x _(j)).  (101)It follows thatΛ(θ;{right arrow over (x)})=C _(j)′(θ;{right arrow over (x)} _(¬j))cos(x_(j))+S _(j)′(θ;{right arrow over (x)} _(¬j))sin(x _(j)).  (102)is also trigono-multilinear in {right arrow over (x)}, where C′_(j)(θ;{right arrow over (x)}_(¬j))=∂_(θ)C_(j)(θ; {right arrow over (x)}_(¬j))and S′_(j)(θ; {right arrow over (x)}_(¬j))=∂_(θ)S_(j)(θ; {right arrowover (x)}_(¬j)) are the derivatives of C_(j)(θ; {right arrow over(x)}_(¬j)) and S_(j)(θ; {right arrow over (x)}_(¬j)) with respect to θ,respectively.

Our optimization algorithms require efficient procedures for evaluatingC_(j)(θ; {right arrow over (x)}_(¬j))=∂_(θ)S_(j)(θ; {right arrow over(x)}_(¬j)) and S_(j)′(θ; {right arrow over (x)}_(¬j)) for given θ and{right arrow over (x)}_(¬j). It turns out that these tasks can beaccomplished in O(L) time.

Lemma 2. Given θ and each of C_(j)(θ; {right arrow over (x)}_(¬j)),S_(j)(θ; {right arrow over (x)}_(¬j)), C_(j)′(θ; {right arrow over(x)}_(¬j)) and S_(j)′(θ; {right arrow over (x)}_(¬j)) can be computed inO(L) time.

Proof. For convenience, we introduce the following notation. LetW_(2i)=V(x_(2L−2i)), W_(2i+1)=U(θ; x_(2L−2i−1)), for i=0, 1, . . . ,L−1. Furthermore, let W′_(j)=∂_(θ)W_(j) for j=0, 1, . . . , 2L−1. Notethat W′_(j)=0 if j is even. Then we define P_(a,b)=W_(a)W_(a+1) . . .W_(b) if 0≤a≤b≤2L −1, and P_(a,b)=I otherwise.

With this notation, it can be shown thatQ(θ;{right arrow over (x)})=P _(0,a−1) W _(a) P _(a+1,2)L−1,∀0≤a≤2L−1,  (103)andQ′(θ;{right arrow over (x)})=P _(0,0) W′ ₁ P _(2,2L−1) +P _(0,2) W′ ₃ P_(4,2L−1) + . . . P _(0,2L−4) W′ _(2L−3) P _(2L−2,2L−1) +P _(0,2L−2) W′_(2L−1).  (104)

In order to evaluate C_(j)(θ; {right arrow over (x)}_(¬j)), S_(j)(θ;{right arrow over (x)}_(¬j)), C_(j)′(θ; {right arrow over (x)}_(¬j)) andS_(j)′(θ; {right arrow over (x)}_(¬j)) for given θ and {right arrow over(x)}_(¬j), we consider the case j is even and the case j is oddseparately.

-   -   Case 1: j=2(L−t) is even, where 0≤t≤L−1. In this case,        W_(2t)=V(X_(j)). Using the fact

$\begin{matrix}{{Q\left( {\theta;\overset{\rightarrow}{x}} \right)} = {P_{0,{{2t} - 1}}W_{2t}P_{{{2t} + 1},{{2L} - 1}}}} & (105) \\{= {{P_{0,{{2t} - 1}}\left( {{\cos\mspace{14mu}\left( x_{j} \right)\overset{\_}{I}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)\overset{\_}{Z}}} \right)}P_{{{2t} + 1},{{2L} - 1}}}} & (106) \\{{= {{\cos\mspace{14mu}\left( x_{j} \right)P_{0,{{2t} - 1}}P_{{{2t} + 1},{{2L} - 1}}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)P_{0,{{2t} - 1}}\overset{\_}{Z}P_{{{2t} + 1},{{2L} - 1}}}}},} & (107)\end{matrix}$

-   -   we obtain        Λ(θ;{right arrow over (x)})=C _(j)(θ;{right arrow over (x)}        _(¬j))cos(x _(j))+S _(j)(θ;{right arrow over (x)} _(¬j))sin(x        _(j)),  (108)        where        C _(j)(θ;{right arrow over (x)} _(¬j))=Re(        0|P _(0,2t−1) P _(2t+1,2L−1)|0        ),  (109)        S _(j)(θ;{right arrow over (x)} _(¬j))=Im(        0|P _(0,2t−1) ZP _(2t+1,2L−1)|0        ),  (110)    -   Given θ and {right arrow over (x)}_(¬j), we first compute        P_(0,2t−1) and P_(2t+1,2L−1) in O(L) time.

Then we calculate C_(j)(θ; {right arrow over (x)}_(¬j)) and S_(j)(θ;{right arrow over (x)}_(¬j)) by Eqs. (109) and (110). This proceduretakes only O(L) time.

Next, we describe how to compute C′_(j)(θ; {right arrow over (x)}_(¬j))and S′_(j)(θ; {right arrow over (x)}_(¬j)). Using Eq. (104) and the factP_(a,b)=P_(a,2t−1) W_(2t)P_(2t+1,b), for any a≤2t≤b, we obtain

$\begin{matrix}{{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{P_{0,0}W_{1}^{\prime}P_{2,{{2t} - 1}}W_{2t}P_{{{2t} + 1},{{2L} - 1}}} + {P_{0,2}W_{3}^{\prime}P_{4,{{2t} - 1}}W_{2t}P_{{{2t} + 1},{{2L} - 1}}} + \cdots + {P_{0,{{2t} - 2}}W_{{2t} - 1}^{\prime}W_{2t}P_{{{2t} + 1},{{2L} - 1}}} + {P_{0,{{2t} - 1}}W_{2t}W_{{2t} + 1}^{\prime}P_{{{2t} + 2},{{2L} - 1}}} + \cdots + {P_{0,{{2t} - 1}}W_{2t}P_{{{2t} + 1},{{2L} - 4}}W_{{2L} - 3}^{\prime}P_{{{2L} - 2},{{2L} - 1}}} + {P_{0,{{2t} - 1}}W_{2t}P_{{{2t} + 1},{{2L} - 2}}{W_{{2L} - 1}^{\prime}.}}}} & (111)\end{matrix}$

Let

$\begin{matrix}{A_{t} = {\sum\limits_{s = 1}^{t}\;{P_{0,{{2s} - 1}}W_{{2s} - 1}^{\prime}P_{{2s},{{2t} - 1}}}}} & (112) \\{{= {\sum\limits_{s = 1}^{t}\;{P_{0,{{2s} - 2}}{U^{\prime}\left( {\theta;x_{{2L} - {2s} + 1}} \right)}P_{{2s},{{2t} - 1}}}}},} & (113) \\{B_{t} = {\sum\limits_{s = {t + 1}}^{L}\;{P_{{{2t} + 1},{{2s} - 2}}W_{{2s} - 1}^{\prime}P_{{2s},{{2L} - 1}}}}} & (114) \\{= {\sum\limits_{s = {t + 1}}^{L}\;{P_{{{2t} + 1},{{2s} - 2}}{U^{\prime}\left( {\theta;x_{{2L} - {2s} + 1}} \right)}{P_{{2s},{{2L} - 1}}.}}}} & (115)\end{matrix}$

Then Eq. (111) yields

$\begin{matrix}{\mspace{76mu}{{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{A_{t}W_{2t}P_{{{2t} + 1},{{2L} - 1}}} + {P_{0,{{2t} - 1}}W_{2t}B_{t}}}}} & (116) \\{\left. {= {{{A_{t}\left( {{\cos\mspace{14mu}\left( x_{j} \right)\overset{\_}{I}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)\overset{\_}{Z}}} \right)}P_{{{2t} + 1},{{2L} - 1}}} + {{P_{0,{{2t} - 1}}\left( {\cos\mspace{14mu}\left( x_{j} \right)} \right)}\overset{\_}{I}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)\overset{\_}{Z}}}} \right)B_{t}} & (117) \\{{= {{\cos\mspace{14mu}\left( x_{j} \right)\left( {{A_{t}P_{{{2t} + 1},{{2L} - 1}}} + {P_{0,{{2t} - 1}}B_{t}}} \right)} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)\left( {{A_{t}\overset{\_}{Z}P_{{{2t} + 1},{{2L} - 1}}} + {P_{0,{{2t} - 1}}\overset{\_}{Z}B_{t}}} \right)}}},} & (118)\end{matrix}$which leads toΛ′(θ;{right arrow over (x)})=C′ _(j)(θ;{right arrow over (x)}_(¬j))cos(x _(j))+S′ _(j)(θ;{right arrow over (x)} _(¬j))sin(x_(j)),  (119)whereC′ _(j)(θ;{right arrow over (x)} _(¬j))=Re(

0|(A _(t) P _(2t+1,2L−1) +P _(0,2t−1) B _(t))|0

),  (120)S′ _(j)(θ;{right arrow over (x)} _(¬j))=Im(

0|(A _(t) ZP _(2t+1,2L−1) P _(0,2t−1) ZB _(t))|0

).  (121)

Given θ and {right arrow over (x)}_(¬j), we first compute the followingmatrices in a total of O(L) time by standard dynamic programmingtechniques:P _(0,2s−2) and P _(2s,2t−1) for s=1,2, . . . ,t;P _(2t+1,2s−2) and P _(2,2L−1) for s=t+1,t+2, . . . ,L;P _(0,2t−1) and P _(2t+1,2L−1).

Then we compute A_(t) and B_(t) by Eqs. (113) and (115). After that, wecalculate C′_(j)(θ; {right arrow over (x)}_(¬j)) and S′_(j)(θ; {rightarrow over (x)}_(¬j)) by Eqs. (120) and (121). Overall, this proceduretakes O(L) time.

-   -   1. Case 2: j=2 (L−t)−1 is odd, where 0≤t≤L−1. In this case,        W_(2t+1)=U(θ; x_(j)). Using the fact

$\begin{matrix}{{Q\left( {\theta;\overset{\rightarrow}{x}} \right)} = {P_{0,{2t}}W_{{2t} + 1}P_{{{2t} + 2},{{2L} - 1}}}} & (122) \\{= {{P_{0,{2t}}\left( {{\cos\mspace{14mu}\left( x_{j} \right)\overset{\_}{I}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right){P(\theta)}}} \right)}P_{{{2t} + 2},{{2L} - 1}}}} & (123) \\{{= {{\cos\mspace{14mu}\left( x_{j} \right)P_{0,{2t}}P_{{{2t} + 2},{{2L} - 1}}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)P_{0,{2t}}{P(\theta)}P_{{{2t} + 2},{{2L} - 1}}}}},} & (124)\end{matrix}$

-   -   2. we obtain        Λ(θ;{right arrow over (x)})=C _(j)(θ;{right arrow over (x)}        _(¬j))cos(x _(j))+S _(j)(θ;{right arrow over (x)} _(¬j))sin(x        _(j)),  (125)    -   3. where        C _(j)(θ;{right arrow over (x)} _(¬j))=Re(        0|P _(0,2t) P _(2t+2,2L−1)|0        ),  (126)        S _(j)(θ;{right arrow over (x)} _(¬j))=Im(        0 |P _(0,2t) P(θ)P _(2t+2,2L−1)|0        ).  (127)    -   4. Given θ and {right arrow over (x)}_(¬j), we first compute        P_(0,2t) and P_(2t+2,2L−1) in O(L) time.    -   Then we calculate C_(j)(θ; {right arrow over (x)}_(¬j)) and        S_(j){right arrow over (x)}_(¬j)) by Eqs. (126) and (127). This        procedure takes only O(L) time.

Next, we describe how to compute C′_(j)(θ; {right arrow over (x)}_(¬j))and S′_(j)({right arrow over (x)}_(¬j)). Using Eq. (104) and the factP_(a,b)=P_(a,2t)W_(2t+1)P_(2t+2,b) for any a≤2t+1≤b, we get

$\begin{matrix}{{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{P_{0,0}W_{1}^{\prime}P_{2,{2t}}W_{{2t} + 1}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,2}W_{3}^{\prime}P_{4,{2t}}W_{{2t} + 1}P_{{{2t} + 2},{{2L} - 1}}} + \cdots + {P_{0,{{2t} - 2}}W_{{2t} - 1}^{\prime}P_{{2t},{2t}}W_{{2t} + 1}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}W_{{2t} + 1}^{\prime}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}W_{{2t} + 1}P_{{{2t} + 2},{{2t} + 2}}W_{{2t} + 3}^{\prime}P_{{{2t} + 4},{{2L} - 1}}} + \cdots + {P_{0,{2t}}W_{{2t} + 1}P_{{{2t} + 2},{{2L} - 4}}W_{{2L} - 3}^{\prime}P_{{{2L} - 2},{{2L} - 1}}} + {P_{0,{2t}}W_{{2t} + 1}P_{{{2t} + 2},{{2L} - 2}}{W_{{2L} - 1}^{\prime}.}}}} & (128)\end{matrix}$

Let

$\begin{matrix}{A_{t} = {\sum\limits_{s = 1}^{t}\;{P_{0,{{2s} - 2}}W_{{2s} - 1}^{\prime}P_{{2s},{2t}}}}} & (129) \\{{= {\sum\limits_{s = 1}^{t}\;{P_{0,{{2s} - 2}}{U^{\prime}\left( {\theta;x_{{2L} - {2s} + 1}} \right)}P_{{2s},{2t}}}}},} & (130) \\{B_{t} = {\sum\limits_{s = {t + 2}}^{L}\;{P_{{{2t} + 2},{{2s} - 2}}W_{{2s} - 1}^{\prime}P_{{2s},{{2L} - 1}}}}} & (131) \\{= {\sum\limits_{s = {t + 2}}^{L}\;{P_{{{2t} + 2},{{2s} - 2}}{U^{\prime}\left( {\theta;x_{{2L} - {2s} + 1}} \right)}{P_{{2s},{{2L} - 1}}.}}}} & (132)\end{matrix}$

Then Eq. (128) yields

$\begin{matrix}{{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{A_{t}W_{{2t} + 1}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}W_{{2t} + 1}^{\prime}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}W_{{2t} + 1}B_{t}}}} & (133) \\{= {{{A_{t}\left( {{\cos\mspace{14mu}\left( x_{j} \right)\overset{\_}{I}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right){P(\theta)}}} \right)}P_{{{2t} + 2},{{2L} - 1}}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)P_{0,{2t}}{P^{\prime}(\theta)}P_{{{2t} + 2},{{2L} - 1}}} + {{P_{0,{2t}}\left( {{\cos\mspace{14mu}\left( x_{j} \right)\overset{\_}{I}} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right){P(\theta)}}} \right)}B_{t}}}} & (134) \\{{= {{\cos\mspace{14mu}\left( x_{j} \right)\left( {{A_{t}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}B_{t}}} \right)} - {i\mspace{14mu}\sin\mspace{14mu}\left( x_{j} \right)\left( {{A_{t}{P(\theta)}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}{P^{\prime}(\theta)}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}{P(\theta)}B_{t}}} \right)}}},} & (135)\end{matrix}$which leads toΛ′(θ;{right arrow over (x)})=C′ _(j)({right arrow over (x)} _(¬j))cos(x_(j))+S′ _(j)({right arrow over (x)} _(¬j))sin(x _(j)),  (136)where

$\begin{matrix}{\mspace{76mu}{{{C_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{⫬ j}} \right)} = {{Re}\left( \left\langle {\overset{\_}{0}{\left( {{A_{t}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}B_{t}}} \right)}\overset{\_}{0}} \right\rangle \right)}},}} & (137) \\{{S_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{⫬ j}} \right)} = {{{Im}\left( \left\langle {\overset{\_}{0}{\left( {{A_{t}{P(\theta)}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}{P^{\prime}(\theta)}P_{{{2t} + 2},{{2L} - 1}}} + {P_{0,{2t}}{P(\theta)}B_{t}}} \right)}\overset{\_}{0}} \right\rangle \right)}.}} & (138)\end{matrix}$

Given θ and {right arrow over (x)}_(¬j), we first compute the followingmatrices in a total of O(L) time by standard dynamic programmingtechniques:

-   -   P_(0,2s−2) and P_(2s,2t) for s=1,2, . . . , t;    -   P_(2t+2,2s−2) and P_(2,2L−1) for s=t+2, t+3, . . . , L;    -   P_(0,2t) and P_(2t+2,2L−1).

Then we compute A_(t) and B_(t) by Eqs. (130) and (132). After that, wecalculate C′_(j)({right arrow over (x)}_(¬j)) and S′_(j)({right arrowover (x)}_(¬j)) by Eqs. (137) and (138). Overall, this procedure takesO(L) time.

W

A.1.2. Maximizing the Fisher Information of the Likelihood Function

We propose two algorithms for maximizing the Fisher information of thelikelihood function

(d|θ; ƒ, {right arrow over (x)}) at a given point θ=μ(i.e. the priormean of θ). Namely, our goal is to find {right arrow over (x)}∈

^(2L) that maximize

$\begin{matrix}{{\left( {{\theta;f},\overset{\rightarrow}{x}} \right)} = {\frac{{f^{2}\left( {\Lambda^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)}^{2}}{1 - {f^{2}\left( {\Lambda\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)}^{2}}.}} & (139)\end{matrix}$

These algorithms are similar to Algorithms 1 and 2 for Fisherinformation maximization in the ancilla-free case, in the sense thatthey are also based on gradient ascent and coordinate ascent,respectively. The main difference is that now we invoke the proceduresin Lemma 2 to evaluate C(μ; {right arrow over (x)}_(¬j)), S({right arrowover (x)}_(¬j)), C′({right arrow over (x)}_(¬j)) and S′({right arrowover (x)}_(¬j)) for given {right arrow over (x)}_(¬j), and then use themto either compute the partial derivative of

(μ; ƒ, {right arrow over (x)}) with respect to x_(j) (in gradientascent) or define a single-variable optimization problem for x_(j) (incoordinate ascent). These algorithms are formally described inAlgorithms 5 and 6.

A.1.3. Maximizing the Slope of the Likelihood Function

We also propose two algorithms for maximizing the slope of thelikelihood function

(d|θ; ƒ, {right arrow over (x)}) at a given point θ=μ(i.e. the priormean of θ). Namely, our goal is to find {right arrow over (x)}∈

^(2L) that maximize |

′(d|θ; ƒ, {right arrow over (x)})|=ƒ|Λ′(θ; {right arrow over (x)})|/2.

These algorithms are similar to Algorithms 3 and 4 for slopemaximization in the ancilla-free case, in the sense that they are alsobased on gradient ascent and coordinate ascent, respectively. The maindifference is that now we invoke the procedures in Lemma 2 to evaluateC′(μ; {right arrow over (x)}_(¬j)) and S′(μ; {right arrow over(x)}_(¬j)) for given μ and {right arrow over (x)}_(¬j). Then we usethese quantities to either compute the partial derivative of (Λ(μ;{right arrow over (x)}))² with respect to x_(j) (in gradient ascent) ordirectly update the value of x_(j) (in coordinate ascent). Thesealgorithms are formally described in Algorithms 7 and 8.

A.2. Approximate Bayesian Inference with Engineered Likelihood Functions

With the algorithms for tuning the circuit parameters {right arrow over(x)} in place, we now briefly describe how to perform Bayesian inferenceefficiently with the resultant likelihood functions. The idea is similarto the one in Section 4.2 for the ancilla-free scheme. Suppose θ hasprior distribution

(μ, σ²), where σ«1/L, and the fidelity of the process for generating theELF is ƒ. We find that the parameters {right arrow over (x)}=(x₁, x₂, .. . , x_(2L)) that maximize

(μ; ƒ, {right arrow over (x)}) or (|Λ′(μ; {right arrow over (x)})|)satisfy the following property: When θ is close to μ, i.e. θ∈[μ−O(σ),μ+O(σ)], we have

$\begin{matrix}{{\left( {{{d❘\theta};f},\overset{\rightarrow}{x}} \right)} \approx \frac{1 + {\left( {- 1} \right)^{d}f\mspace{14mu}\sin\mspace{14mu}\left( {{r\;\theta} + b} \right)}}{2}} & (147)\end{matrix}$for some r, b∈

. We find the best-fitting r and b by solving the following leastsquares problem:

$\begin{matrix}{{\left( {r^{*},b^{*}} \right) = {\underset{r,b}{\arg\mspace{14mu}\min}\mspace{14mu}{\sum\limits_{\theta \in \Theta}{{{\arcsin\mspace{14mu}\left( {\Lambda\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)} - {r\;\theta} - b}}^{2}}}},} & (148)\end{matrix}$where Θ={θ₁, θ₂, θ_(k)}⊆[μ−O(σ), μ+O(σ)]. This least-squares problem hasthe following analytical solution:

$\begin{matrix}{{\begin{pmatrix}r^{*} \\b^{*}\end{pmatrix} = {{A^{+}z} = {\left( {A^{T}A} \right)^{- 1}A^{T}z}}},{where}} & (149) \\{{A = \begin{pmatrix}\theta_{1} & 1 \\\theta_{2} & 1 \\\vdots & \vdots \\\theta_{k} & 1\end{pmatrix}},{z = {\begin{pmatrix}{\arcsin\mspace{14mu}\left( {\Lambda\left( {\theta_{1};\overset{\rightarrow\;}{x}} \right)} \right)} \\{\arcsin\mspace{14mu}\left( {\Lambda\left( {\theta_{1};\overset{\rightarrow}{x}} \right)} \right)} \\\vdots \\{\arcsin\mspace{14mu}\left( {\Lambda\left( {\theta_{k};\overset{\rightarrow}{x}} \right)} \right)}\end{pmatrix}.}}} & (156)\end{matrix}$

FIG. 34 illustrates an example of the true and fitted likelihoodfunctions.

Once we obtain the optimum r and b, we approximate the posterior meanand variance of θ with the ones for

$\begin{matrix}{{{\left( {{d❘\theta};f} \right)} = \frac{1 + {\left( {- 1} \right)^{d}f\mspace{14mu}\sin\mspace{14mu}\left( {{r\;\theta} + b} \right)}}{2}},} & (157)\end{matrix}$which have analytical formulas. Specifically, suppose θ has priordistribution

(μ_(k), σ_(k) ²) at round k. Let d_(k) be the measurement outcome and(r_(k), b_(k)) be the best-fitting parameters at this round. Then weapproximate the posterior mean and variance of θ by

$\begin{matrix}{\mspace{76mu}{{\mu_{k + 1} = {\mu_{k} + \frac{\left( {- 1} \right)^{d_{k}}{fe}^{{- r_{k}^{2}}\sigma_{k}^{2}\text{/}2}r_{k}\sigma_{k}^{2}\mspace{14mu}\cos\mspace{14mu}\left( {{r_{k}\mu_{k}} + b_{k}} \right)}{1 + {\left( {- 1} \right)^{d_{k}}{de}^{{- r_{k}^{2}}\sigma_{k}^{2}\text{/}2}\mspace{14mu}\sin\mspace{14mu}\left( {{r_{k}\mu_{k}} + b_{k}} \right)}}}},}} & (158) \\{\sigma_{k + 1}^{2} = {{\sigma_{k}^{2}\left( {1 - \frac{{fr}_{k}^{2}\sigma_{k}^{2}{e^{{- r_{k}^{2}}\sigma_{k}^{2}\text{/}2}\left\lbrack {{fe}^{{- r_{k}^{2}}\sigma_{k}^{2}\text{/}2} + {\left( {- 1} \right)^{d_{k}}\mspace{14mu}\sin\mspace{14mu}\left( {{r_{k}\mu_{k}} + b_{k}} \right)}} \right\rbrack}}{\left\lbrack {1 + {\left( {- 1} \right)^{d_{k}}{fe}^{{- r_{k}^{2}}\sigma_{k}^{2}\text{/}2}\mspace{14mu}\sin\mspace{14mu}\left( {{r_{k}\mu_{k}} + b_{k}} \right)}} \right\rbrack^{2}}} \right)}.}} & (159)\end{matrix}$

-   -   After that, we proceed to the next round, setting        (μ_(k+1), σ_(k+1) ²) as the prior distribution of θ for that        round. The approximation errors incurred by Eqs. (158) and (159)        are small and have negligible impact on the performance of the        whole algorithm for the same reason as in the ancilla-free case.        C. Proof of Lemma

For convenience, we introduce the following notation. LetW_(2i)=U^(†)(θ; x_(2i+1))=U(θ; −X_(2i+1)),W_(2i+1)=V^(†)(x_(2i+2))=V(−x_(2i+2)), W_(4L−2i)=U(θ; x_(2i+1)), andW_(4L−2i−1)=V(x_(2i+2)), for i=0, 1, . . . , L−1, and W_(2L)=P(θ).Furthermore, let W′_(j)=∂_(θ)W_(j) for j=0, 1, . . . , 4L. Note thatW′_(j)=0 if j is odd. Then we define P_(a,b)=W_(a)W_(a+1) . . . W_(b) if0≤a≤b≤4L, and P_(a,b)=I otherwise.

With this notation,Q(θ;{right arrow over (x)})^(†) =P _(0,a−1) W _(a) P_(a+1,2L−1),∀0≤a≤2L−1,Q(θ;{right arrow over (x)})=P _(2L+1,b−1) W _(b) P _(b+1,4L),∀2L+1≤b≤4L,Q(θ;{right arrow over (x)})^(†) P(θ)Q(θ;{right arrow over (x)})=P_(0,a−1) W _(a) P _(a+1,b−1) W _(b) P _(b+1,4L),∀0≤a<b≤4L.

Moreover, taking the derivative with respect to θ yields

${{{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{\partial_{\theta}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}} = {{{V\left( x_{2L} \right)}{U^{\prime}\left( {\theta;x_{{2L} - 1}} \right)}{V\left( x_{{2L} - 2} \right)}{U\left( {\theta;x_{{2L} - 3}} \right)}\ldots{V\left( x_{4} \right)}{U\left( {\theta;x_{3}} \right)}{V\left( x_{2} \right)}{U\left( {\theta;x_{1}} \right)}} + {{V\left( x_{2L} \right)}{U\left( {\theta;x_{{2L} - 1}} \right)}{V\left( x_{{2L} - 2} \right)}{U^{\prime}\left( {\theta;x_{{2L} - 3}} \right)}\ldots{V\left( x_{4} \right)}{U\left( {\theta;x_{3}} \right)}{V\left( x_{2} \right)}{U\left( {\theta;x_{1}} \right)}} + \cdots + {{V\left( x_{2L} \right)}{U\left( {\theta;x_{{2L} - 1}} \right)}{V\left( x_{{2L} - 2} \right)}{U\left( {\theta;x_{{2L} - 3}} \right)}\ldots{V\left( x_{4} \right)}{U^{\prime}\left( {\theta;x_{3}} \right)}{V\left( x_{2} \right)}{U\left( {\theta;x_{1}} \right)}} + {{V\left( x_{2L} \right)}{U\left( {\theta;x_{{2L} - 1}} \right)}{V\left( x_{{2L} - 2} \right)}{U\left( {\theta;x_{{2L} - 3}} \right)}\ldots{V\left( x_{4} \right)}{U\left( {\theta;x_{3}} \right)}{V\left( x_{2} \right)}{U^{\prime}\left( {\theta;x_{1}} \right)}}}}},{where}}{{U^{\prime}\left( {\theta;\alpha} \right)} = {{\partial_{\theta}{U\left( {\theta;\alpha} \right)}} = {{{- i}\sin(\alpha){P^{\prime}(\theta)}} = {i\sin(\alpha)\left( {{\sin(\theta)\overset{\_}{Z}} - {\cos(\theta)\overset{\_}{X}}} \right)}}}}$is the derivative of U(θ; α) with respect to θ, in whichP′(θ)=−sin(θ) Z +cos(θ) Xis the derivative of P(θ) with respect to θ. It follows that

${Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{P_{{{2L} + 1},{{2L} + 1}}W_{{2L} + 2}^{\prime}P_{{{2L} + 3},{4L}}} + {P_{{{2L} + 1},{{2L} + 3}}W_{{2L} + 4}^{\prime}P_{{{2L} + 5},{4L}}} + \cdots + {P_{{{2L} + 1},{{4L} - 3}}W_{{4L} - 2}^{\prime}P_{{{4L} - 1},{4L}}} + {P_{{{2L} + 1},{{4L} - 1}}{W_{4L}^{\prime}.}}}$

The following facts will be useful. Suppose A, B and C are arbitrarylinear operators on the Hilbert space

=span{|0

,|1

}. Then by direct calculation, one can verify that

${{\left\langle {\overset{\_}{0}{❘{{{AV}\left( {- x} \right)}{{BV}(x)}C}❘}\overset{\_}{0}} \right\rangle = {\left\langle {\overset{\_}{0}{❘{{A\left\lbrack {{\cos(x)\overset{\_}{I}} + {i\sin(x)\overset{\_}{Z}}} \right\rbrack}{B\left\lbrack {{\cos(x)\overset{\_}{I}} - {i\sin(x)\overset{\_}{Z}}} \right\rbrack}C}❘}\overset{\_}{0}} \right\rangle = {\frac{1}{2}\left\lbrack {{\cos\left( {2x} \right)\left\langle {\overset{\_}{0}{❘{{A\left( {B - {\overset{\_}{Z}B\overset{\_}{Z}}} \right)}C}❘}\overset{\_}{0}} \right\rangle} - {i\sin\left( {2x} \right)\left\langle {\overset{\_}{0}{❘{{A\left( {{B\overset{\_}{Z}} - {\overset{\_}{Z}B}} \right)}C}❘}\overset{\_}{0}} \right\rangle} + \left\langle {\overset{\_}{0}{❘{{A\left( {B + {\overset{\_}{Z}B\overset{\_}{Z}}} \right)}C}❘}\overset{\_}{0}} \right\rangle} \right\rbrack}}},{\left\langle {\overset{\_}{0}{❘{{{AU}\left( {\theta;{- x}} \right)}{{BU}\left( {\theta;x} \right)}C}❘}\overset{\_}{0}} \right\rangle = {\left\langle {\overset{\_}{0}{❘{{A\left\lbrack {{\cos(x)\overset{\_}{I}} + {i\sin(x){P(\theta)}}} \right\rbrack}{B\left\lbrack {{\cos(x)\overset{\_}{I}} - {i\sin(x){P(\theta)}}} \right\rbrack}C}❘}\overset{\_}{0}} \right\rangle = {\frac{1}{2}\left\lbrack {{\cos\left( {2x} \right)\left\langle {\overset{\_}{0}{❘{{A\left( {B - {{P(\theta)}{{BP}(\theta)}}} \right)}C}❘}\overset{\_}{0}} \right\rangle} - {i\sin\left( {2x} \right)\left\langle {\overset{\_}{0}{❘{{A\left( {{{BP}(\theta)} - {{P(\theta)}B}} \right)}C}❘}\overset{\_}{0}} \right\rangle} + \left\langle {\overset{\_}{0}{❘{{A\left( {B + {{P(\theta)}{{BP}(\theta)}}} \right)}C}❘}\overset{\_}{0}} \right\rangle} \right\rbrack}}},{and}}{\left\langle {\overset{\_}{0}{❘{{{AU}\left( {\theta;{- x}} \right)}{{BU}^{\prime}\left( {\theta;x} \right)}C}❘}\overset{\_}{0}} \right\rangle = {\left\langle {\overset{\_}{0}{❘{{A\left\lbrack {{\cos(x)\overset{\_}{I}} + {i\sin(x){P(\theta)}}} \right\rbrack}{B\left\lbrack {{- i}\sin(x){P^{\prime}(\theta)}} \right\rbrack}C}❘}\overset{\_}{0}} \right\rangle = {{\frac{1}{2}\left\lbrack {{{- \cos}\left( {2x} \right)\left\langle {\overset{\_}{0}{❘{{{AP}(\theta)}{{BP}^{\prime}(\theta)}C}❘}\overset{\_}{0}} \right\rangle} - {i\sin\left( {2x} \right)\left\langle {\overset{\_}{0}{❘{{{ABP}^{\prime}(\theta)}C}❘}\overset{\_}{0}} \right\rangle} + \left\langle {\overset{\_}{0}{❘{{{AP}(\theta)}{{BP}^{\prime}(\theta)}C}❘}\overset{\_}{0}} \right\rangle} \right\rbrack}.}}}$

The following fact will be also useful. Taking the derivative withrespect to θ yields

${\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{\left\langle {\overset{\_}{0}{❘{{Q^{\dagger}\left( {\theta;\overset{\rightarrow}{x}} \right)}{P(\theta)}{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)}}❘}\overset{\_}{0}} \right\rangle + \left\langle {\overset{\_}{0}{❘{{Q^{\dagger}\left( {\theta;\overset{\rightarrow}{x}} \right)}{P^{\prime}(\theta)}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}}❘}\overset{\_}{0}} \right\rangle + \left\langle {\overset{\_}{0}{❘{\left( {Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} \right)^{\dagger}{P(\theta)}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}}❘}\overset{\_}{0}} \right\rangle} = {{2{{Re}\left( \left\langle {\overset{\_}{0}{❘{{Q^{\dagger}\left( {{\theta l}\overset{\rightarrow}{x}} \right)}{P(\theta)}{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)}}❘}\overset{\_}{0}} \right\rangle \right)}} + {\left\langle {\overset{\_}{0}{❘{{Q^{\dagger}\left( {\theta;\overset{\rightarrow}{x}} \right)}{P^{\prime}(\theta)}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}}❘}\overset{\_}{0}} \right\rangle.}}}$

In order to evaluate C_(j)(θ; {right arrow over (x)}_(¬j)), S_(j)(θ;{right arrow over (x)}_(¬j)), B_(j)(θ; {right arrow over (x)}_(¬j)),C′_(j)(θ; {right arrow over (x)}_(¬j)), S′_(j)(θ; {right arrow over(x)}_(¬j)) and B′_(j)(θ; {right arrow over (x)}_(¬j)) for given θ and{right arrow over (x)}_(¬j), we consider the case j is even and the casej is odd separately.

-   -   Case 1: j=2(t+1) is even, where 0≤t≤L−1. In this case,        W_(2t+1)=V(x_(j)), and W_(4L−2t−1)=V(x_(j)). Then we obtain

${{{\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)} = {\left\langle {\overset{\_}{0}{❘{P_{0,{2t}}{C\left( {- x_{j}} \right)}P_{{{2t} + 2},{{4L} - {2t} - 2}}{V\left( x_{j} \right)}P_{{{4L} - {2t}},{4L}}}❘}\overset{\_}{0}} \right\rangle = {{C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} + {{S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}\sin\left( {2x_{j}} \right)} + {B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}}}},{where}}{{C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {\frac{1}{2}\left\langle {{\overset{\_}{0}\left. ❘{{P_{0,{2t}}{C\left( {- x_{j}} \right)}P_{{{2t} + 2},{{4L} - {2t} - 2}}} - {\overset{\_}{Z}P_{{{2t} + 2},{{4L} - {2t} - 2}}\overset{\_}{Z}}} \right)P_{{{4L} - {2t}},{4L}}\left. ❘\overset{\_}{0} \right\rangle},{{S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{- \frac{i}{2}}\left\langle {\overset{\_}{0}{❘{{{P_{0,{2t}}\left( {{P_{{{2t} + 2},{{4L} - {2t} - 2}}\overset{\_}{Z}} - {\overset{\_}{Z}P_{{{2t} + 2},{{4L} - {2t} - 2}}}} \right)}P_{{{4L} - {2t}},{4L}}\left. ❘\overset{\_}{0} \right\rangle},{{B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {\frac{1}{2}\left\langle {\overset{\_}{0}{❘{{P_{0,{2t}}\left( {P_{{{2t} + 2},{{4L} - {2t} - 2}} - {\overset{\_}{Z}P_{{{2t} + 2},{{4L} - {2t} - 2}}\overset{\_}{Z}}} \right)}P_{{{4L} - {2t}},{4L}}{\left. ❘\overset{\_}{0} \right\rangle.}}}} \right.}}}}} \right.}}} \right.}}$

-   -   Given θ and {right arrow over (x)}_(¬j), we first compute        P_(0,2t), P_(2t+2,4L−2t−2) and P_(4L−2t,4L) in O(L) time. Then        we calculate C_(j)(θ; {right arrow over (x)}_(¬j)), S_(j)(θ;        {right arrow over (x)}_(¬j)) and B_(j)(θ; {right arrow over        (x)}_(¬j)). This procedure takes only O(L) time.

Next, we show how to compute C′_(j)(θ; {right arrow over (x)}_(¬j)),S′_(j)(θ; {right arrow over (x)}_(¬j)) and B′_(j)(θ; {right arrow over(x)}_(¬j)). Using the above and the factP_(a,b)=P_(a,4L−2t−2)W_(4L−2t−1)P_(4L−2t,4L) for any a≤4L−2t−1≤b, weobtain

${Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{P_{{{2L} + 1},{{2L} + 1}}W_{{2L} + 2}^{\prime}P_{{{2L} + 3},{{4L} - {2t} - 2}}W_{{4L} - {2t} - 1}P_{{{4L} - {2t}},{4L}}} + {P_{{{2L} + 1},{L\ 3}}W_{{2L} + 4}^{\prime}P_{{{2L} + 5},{{4L} - {2t} - 2}}W_{{4L} - {2t} - 1}P_{{{4L} - {2t}},{4L}}} + \cdots + {P_{{{2L} + 1},{{4L} - {2t} - 3}}W_{{4L} - {2t} - 2}^{\prime}W_{{4L} - {2t} - 1}P_{{{4L} - {2t}},{4L}}} + {P_{{{2L} + 1},{{4L} - {2t} - 2}}W_{{4L} - {2t} - 1}W_{{4L} - {2t}}^{\prime}P_{{{4L} - {2t} + 1},{4L}}} + \cdots + {P_{{{2L} + 1},{{4L} - {2t} - 2}}W_{{4L} - {2t} - 1}P_{{{4L} - {2t}},{{4L} - 3}}W_{{4L} - 2}^{\prime}P_{{{4L} - 1},{4L}}} + {P_{{{2L} + 1},{{4L} - {2t} - 2}}W_{{4L} - {2t} - 1}P_{{{4L} - {2t}},{{4L} - 1}}{W_{4L}^{\prime}.}}}$

Then it follows that

${{{{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}^{\dagger}{P(\theta)}{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)}} = {{A_{t}^{(1)}W_{{2t} + 1}B_{t}^{(1)}W_{{4L} - {2t} - 1}C_{t}^{(1)}} + {A_{t}^{(2)}W_{{2t} + 1}B_{t}^{(2)}W_{{4L} - {2t} - 1}C_{t}^{(2)}}}},{= {{A_{t}^{(1)}{V\left( {- x_{j}} \right)}B_{t}^{(1)}{V\left( x_{j} \right)}C_{t}^{(1)}} + {A_{t}^{(2)}{V\left( {- x_{j}} \right)}B_{t}^{(2)}{V\left( x_{j} \right)}C_{t}^{(2)}}}},{where}}{{A_{t}^{(1)} = P_{0,{2t}}},{B_{t}^{(1)} = P_{{{2t} + 2},{{4L} - {2t} - 2}}},{C_{t}^{(1)} = {{\sum\limits_{k = 0}^{t}{P_{{{4L} - {2t}},{{4L} - {2k} - 1}}W_{{4L} - {2k}}^{\prime}P_{{{4L} - {2k} + 1},{4L}}}} = {\sum\limits_{k = 0}^{t}{P_{{{4L} - {2t}},{{4L} - {2k} - 1}}{U^{\prime}\left( {\theta;x_{{2k} + 1}} \right)}P_{{{4L} - {2k} + 1},{4L}}}}}},{A_{t}^{(2)} = P_{0,{2t}}},{B_{t}^{(2)} = {{\sum\limits_{k = {t + 1}}^{L - 1}{P_{{{2t} + 2},{{4L} - {2k} - 1}}W_{{4L} - {2k}}^{\prime}P_{{{4L} - {2k} + 1},{{4L} - {2t} - 2}}}} = {\sum\limits_{k = {t + 1}}^{L - 1}{P_{{{2t} + 2},{{4L} - {2k} - 1}}{U^{\prime}\left( {\theta;x_{{2k} + 1}} \right)}P_{{{4L} - {2k} + 1},{{4L} - {2t} - 2}}}}}},{C_{t}^{(2)} = {P_{{{4L} - {2t}},{4L}}.}}}$

Meanwhile, we have

${{{{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}^{\dagger}{P^{\prime}(\theta)}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}} = {{A_{t}^{(3)}W_{{2t} + 1}B_{t}^{(3)}W_{{4L} - {2t} - 1}C_{t}^{(3)}} = {A_{t}^{(3)}{V\left( {- x_{j}} \right)}B_{t}^{(3)}{V\left( x_{j} \right)}C_{t}^{(3)}}}},{where}}{{A_{t}^{(3)} = P_{0,{2t}}},{B_{t}^{(3)} = {P_{{{2t} + 2},{{2L} - 1}}{P^{\prime}(\theta)}P_{{{2L} + 1},{{4L} - {2t} - 2}}}},{C_{t}^{(3)} = {P_{{{4L} - {2t}},{4L}}.}}}$

Combining the above facts yields

${{{\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{{C_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}\cos\left( {2x_{j}} \right)} + {{S_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}\sin\left( {2x_{j}} \right)} + {B_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}}},{where}}{{{C_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(1)}\left( {B_{t}^{(1)} - {\overset{\_}{Z}B_{t}^{(1)}\overset{\_}{Z}}} \right)}C_{t}^{(1)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(2)}\left( {B_{t}^{(2)} - {\overset{\_}{Z}B_{t}^{(2)}\overset{\_}{Z}}} \right)}C_{t}^{(2)}}❘}\overset{\_}{0}} \right\rangle \right)} + {\frac{1}{2}\left\langle {\overset{\_}{0}{❘{{A_{t}^{(3)}\left( {B_{t}^{(3)} - {\overset{\_}{Z}B_{t}^{(3)}\overset{\_}{Z}}} \right)}C_{t}^{(3)}}❘}\overset{\_}{0}} \right\rangle}}},{{S_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{{Im}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(1)}\left( {B_{t}^{(1)} - {\overset{\_}{Z}B_{t}^{(1)}}} \right)}C_{t}^{(1)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Im}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(2)}\left( {{B_{t}^{(2)}\overset{\_}{Z}} - {\overset{\_}{Z}B_{t}^{(2)}}} \right)}C_{t}^{(2)}}❘}\overset{\_}{0}} \right\rangle \right)} - {\frac{i}{2}\left\langle {\overset{\_}{0}{❘{{A_{t}^{(3)}\left( {{B_{t}^{(3)}\overset{\_}{Z}} - {\overset{\_}{Z}B_{t}^{(3)}}} \right)}C_{t}^{(3)}}❘}\overset{\_}{0}} \right\rangle}}}}{{B_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(1)}\left( {B_{t}^{(1)} + {\overset{\_}{Z}B_{t}^{(1)}\overset{\_}{Z}}} \right)}C_{t}^{(1)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(2)}\left( {B_{t}^{(2)} + {\overset{\_}{Z}B_{t}^{(2)}\overset{\_}{Z}}} \right)}C_{t}^{(2)}}❘}\overset{\_}{0}} \right\rangle \right)} + {\frac{1}{2}{\left\langle {\overset{\_}{0}{❘{{A_{t}^{(3)}\left( {B_{t}^{(3)} - {\overset{\_}{Z}B_{t}^{(3)}\overset{\_}{Z}}} \right)}C_{t}^{(3)}}❘}\overset{\_}{0}} \right\rangle.}}}}$

Given θ and {right arrow over (x)}_(¬j), we first compute the followingmatrices in a total of O(L) time by standard dynamic programmingtechnique:

-   -   P_(0,2t), P_(2t+2,4L−2t−2), P_(4L−2t,4L), P_(2t+2,2L−1),        P_(2L+1,4L−2t−2),    -   P_(4L−2t,4L−2k−1) and P_(4L−2k+1,4L) for k=0, 1, . . . , t;    -   P_(2t+2,4L−2k−1) and P_(4L−2k+1,4L−2t−2) for t+1,t+2, . . . ,        L−1.

Then we compute A_(t) ^((i)), B_(t) ^((i)) and C_(t) ^((i)) for i=1, 2,3. After that, we calculate C′_(j)(θ; {right arrow over (x)}_(¬j)),S′_(j)(θ; {right arrow over (x)}_(egj)) and B′_(j)(θ; {right arrow over(x)}_(¬j)). Overall, this procedure takes O(L) time.

-   -   5. Case 2: j=2t+1 is odd, where 0≤t≤L−1. In this case,        W_(2t)=U(θ; −x_(j)), and W_(4L−2t)=U(θ; x_(j)). They we get

${{\Delta\left( {\theta;\overset{\rightarrow}{x}} \right)} = {\left\langle {\overset{\_}{0}{❘{P_{0,{{2t} - 1}}{U\left( {\theta;x_{j}} \right)}P_{{{2t} + 1},{{4L} - {2t} - 1}}{U\left( {\theta;x_{j}} \right)}P_{{{4L} - {2t} + 1},{4L}}}❘}\overset{\_}{0}} \right\rangle = {{C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} + {{S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}\sin\left( {2x_{j}} \right)} + {B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}}}},$

-   -   6. where

${{C_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {\frac{1}{2}\left\langle {\overset{\_}{0}{❘{{P_{0,{{2t} - 1}}\left( {P_{{{2t} + 1},{{4L} - {2t} - 2}} - {{P(\theta)}P_{{{2t} + 1},{{4L} - {2t} - 1}}{P(\theta)}}} \right)}P_{{{4L} - {2t} + 1},{4L}}}❘}\overset{\_}{0}} \right\rangle}},{{S_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{- \frac{i}{2}}\left\langle {\overset{\_}{0}{❘{{P_{0,{{2t} - 1}}\left( {{P_{{{2t} + 1},{{4L} - {2t} - 1}}{P(\theta)}} - {P(\theta)}_{{{2t} + 1},{{4L} - {2t} - 1}}} \right)}P_{{{4L} - {2t} + 1},{4L}}}❘}\overset{\_}{0}} \right\rangle}},{{B_{j}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {\frac{1}{2}\left\langle {\overset{\_}{0}{❘{{P_{0,{{2t} - 1}}\left( {P_{{{2t} + 1},{{4L} - {2t} - 1}} + {{P(\theta)}P_{{{2t} + 1},{{4L} - {2t} - 1}}{P(\theta)}}} \right)}P_{{{4L} - {2t} + 1},{4L}}}❘}\overset{\_}{0}} \right\rangle}},$

-   -   7. Given θ and {right arrow over (x)}_(¬j), we first compute        P_(0,2t−1), P_(2t+1,4L−2t−1) and P_(4L−2t+1,4L) in O(L) time.        Then we calculate C_(j)(θ; {right arrow over (x)}_(¬j)),        S_(j)(θ; {right arrow over (x)}_(¬j)) and B_(j)(θ; {right arrow        over (x)}_(¬j). This procedure takes only O(L) time.

Next, we describe how to compute C′_(j)(θ; {right arrow over (x)}_(¬j)),S′_(j)(θ; {right arrow over (x)}_(¬j)) and B′_(j)(θ; {right arrow over(x)}_(¬j)). Using the above and the factP_(a,b)=P_(a,4L−2t−1)W_(4L−2t)P_(4L−2t+1,4L) for any a≤4L−2t≤b, weobtain

${Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{P_{{{2L} + 1},{{2L} + 1}}W_{{2L} + 2}^{\prime}P_{{{2L} + 3},{{4L} - {2t} - 1}}W_{{4L} - {2t}}P_{{{4L} - {2t} + 1},{4L}}} + {P_{{{2L} + 1},{{2L} + 3}}W_{{2L} + 4}^{\prime}P_{{{2L} + 5},{{4L} - {2t} - 1}}W_{{4L} - {2t}}P_{{{4L} - {2t} + 1},{4L}}} + \cdots + {P_{{{2L} + 1},{{4L} - {2t} - 3}}W_{{4L} - {2t} - 2}^{\prime}P_{{{4L} - {2t} - 1},{{4L} - {2t} - 1}}W_{{4L} - {2t}}P_{{{4L} - {2t} + 1},{4L}}} + {P_{{{2L} + 1},{{4L} - {2t} - 1}}W_{{4L} - {2t}}^{\prime}P_{{{4L} - {2t} + 1},{4L}}} + {P_{{{2L} + 1},{{4L} - {2t} - 1}}W_{{4L} - {2t}}P_{{{4L} - {2t} + 1},{{4L} - {t2} + 1}}W_{{4L} - {2t} + 2}^{\prime}P_{{{4L} - {2t} + 3},{4L}}} + \cdots + {P_{{{2L} + 1},{{4L} - {2t} - 1}}W_{{4L} - {2t}}P_{{{4L} - {2t} + 1},{{4L} - 3}}W_{{4L} - 2}^{\prime}P_{{{4L} - 1},{4L}}} + {P_{{{2L} + 1},{{4L} - {2t} - 1}}W_{{4L} - {2t}}P_{{{4L} - {2t} + 1},{{4L} - 1}}{W_{4L}^{\prime}.}}}$

Then it follows that

${{{{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}^{\dagger}{P(\theta)}{Q^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)}} = {{A_{t}^{(1)}W_{2t}B_{t}^{(1)}W_{{4L} - {2t}}C_{t}^{(1)}} + {A_{t}^{(2)}W_{2t}B_{t}^{(2)}W_{{4L} - {2t}}C_{t}^{(2)}} + {A_{t}^{(3)}W_{2t}B_{t}^{(2)}W_{{4L} - {2t}}^{\prime}C_{t}^{(2)}}}},{= {{A_{t}^{(1)}{U\left( {\theta;{- x_{j}}} \right)}B_{t}^{(1)}{U\left( {\theta;x_{j}} \right)}C_{t}^{(1)}} + {A_{t}^{(2)}{U\left( {\theta;{- x_{j}}} \right)}B_{t}^{(2)}{U^{\prime}\left( {\theta;x_{j}} \right)}C_{t}^{(2)}} + {A_{t}^{(3)}{U\left( {\theta;{- x_{j}}} \right)}B_{t}^{(3)}{U\left( {\theta;x_{j}} \right)}C_{t}^{(3)}}}},{where}}{{A_{t}^{(1)} = P_{0,{{2t} - 1}}},{B_{t}^{(1)} = P_{{{2t} + 1},{{4L} - {2t} - 1}}},{C_{t}^{(1)} = {{\sum\limits_{k = 0}^{t - 1}{P_{{{4L} - {2t} + 1},{{4L} - {2k} - 1}}W_{{4L} - {2k}}^{\prime}P_{{{4L} - {2k} + 1},{4L}}}} = {\sum\limits_{k = 0}^{t - 1}{P_{{{4L} - {2t} + 1},{{4L} - {2k} - 1}}{U^{\prime}\left( {\theta;x_{{2k} + 1}} \right)}P_{{{4L} - {2k} + 1},{4L}}}}}},{A_{t}^{(2)} = P_{0,{{2t} - 1}}},{B_{t}^{(2)} = P_{{{2t} + 1},{{4L} - {2t} - 1}}}}{{C_{t}^{(2)} = P_{{{4L} - {2t} + 1},{4L}}},{A_{t}^{(3)} = P_{0,{{2t} - 1}}},{B_{t}^{(3)} = {{\sum\limits_{k = {t + 1}}^{L - 1}{P_{{{2t} + 1},{{4L} - {2k} - 1}}W_{{4L} - {2k}}^{\prime}P_{{{4L} - {2k} + 1},{{4L} - {2t} - 1}}}} = {\sum\limits_{k = {t + 1}}^{L - 1}{P_{{{2t} + 1},{{4L} - {2k} - 1}}{U^{\prime}\left( {\theta;x_{{2k} + 1}} \right)}P_{{{4L} - {2k} + 1},{{4L} - {2t} - 1}}}}}},{C_{t}^{(3)} = {P_{{{4L} - {2t} + 1},{4L}}.}}}$

Meanwhile, we have

${{{{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}^{\dagger}{P^{\prime}(\theta)}{Q\left( {\theta;\overset{\rightarrow}{x}} \right)}} = {{A_{t}^{(4)}W_{2t}B_{t}^{(4)}W_{{4L} - {2t}}C_{t}^{(4)}} = {A_{t}^{(4)}{U\left( {\theta;{- x_{j}}} \right)}B_{t}^{(4)}{U\left( {\theta;x_{j}} \right)}C_{t}^{(4)}}}},{where}}{{A_{t}^{(4)} = P_{0,{{2t} - 1}}},{B_{t}^{(4)} = {P_{{{2t} + 1},{{2L} - 1}}{P^{\prime}(\theta)}P_{{{2L} + 1},{{4L} - {2t} - 1}}}},{C_{t}^{(4)} = {P_{{{4L} - {2t} + 1},{4L}}.}}}$

Combining the above facts yields

${{{\Delta^{\prime}\left( {\theta;\overset{\rightarrow}{x}} \right)} = {{{C_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}\cos\left( {2x_{j}} \right)} + {{S_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}\sin\left( {2x_{j}} \right)} + {B_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)}}},{where}}{{{C_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(1)}\left( {B_{t}^{(1)} - {{P(\theta)}B_{t}^{(1)}{P(\theta)}}} \right)}C_{t}^{(1)}}❘}\overset{\_}{0}} \right\rangle \right)} - {{Re}\left( \left\langle {\overset{\_}{0}{❘{A_{t}^{(2)}{P(\theta)}B_{t}^{(2)}{P^{\prime}(\theta)}C_{t}^{(2)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(3)}\left( {B_{t}^{(3)} - {{P(\theta)}B_{t}^{(3)}{P(\theta)}}} \right)}C_{t}^{(3)}}❘}\overset{\_}{0}} \right\rangle \right)} + {\frac{1}{2}\left\langle {\overset{\_}{0}{❘{{A_{t}^{(4)}\left( {B_{t}^{(4)} - {{P(\theta)}B_{t}^{(3)}{P(\theta)}}} \right)}C_{t}^{(4)}}❘}\overset{\_}{0}} \right\rangle}}},{{S_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{{Im}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(1)}\left( {{B_{t}^{(1)}{P(\theta)}} - {{P(\theta)}B_{t}^{(1)}}} \right)}C_{t}^{(1)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Im}\left( \left\langle {\overset{\_}{0}{❘{A_{t}^{(2)}B_{t}^{(2)}{P^{\prime}(\theta)}C_{t}^{(2)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Im}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(3)}\left( {{B_{t}^{(3)}{P(\theta)}} - {{P(\theta)}B_{t}^{(3)}}} \right)}C_{t}^{(3)}}❘}\overset{\_}{0}} \right\rangle \right)} - {\frac{i}{2}\left\langle {\overset{\_}{0}{❘{{A_{t}^{(4)}\left( {{B_{t}^{(4)}{P(\theta)}} - {{P(\theta)}B_{t}^{(3)}}} \right)}C_{t}^{(4)}}❘}\overset{\_}{0}} \right\rangle}}}}{{B_{j}^{\prime}\left( {\theta;{\overset{\rightarrow}{x}}_{\neg j}} \right)} = {{{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(1)}\left( {B_{t}^{(1)} + {{P(\theta)}B_{t}^{(1)}{P(\theta)}}} \right)}C_{t}^{(1)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Re}\left( \left\langle {\overset{\_}{0}{❘{A_{t}^{(2)}{P(\theta)}B_{t}^{(2)}{P^{\prime}(\theta)}C_{t}^{(2)}}❘}\overset{\_}{0}} \right\rangle \right)} + {{Re}\left( \left\langle {\overset{\_}{0}{❘{{A_{t}^{(3)}\left( {B_{t}^{(3)} + {{P(\theta)}B_{t}^{(3)}{P(\theta)}}} \right)}C_{t}^{(3)}}❘}\overset{\_}{0}} \right\rangle \right)} + {\frac{1}{2}{\left\langle {\overset{\_}{0}{❘{{A_{t}^{(4)}\left( {B_{t}^{(4)} - {{P(\theta)}B_{t}^{(3)}{P(\theta)}}} \right)}C_{t}^{(4)}}❘}\overset{\_}{0}} \right\rangle.}}}}$

Given θ and {right arrow over (x)}_(¬j), we first compute the followingmatrices in a total of O(L) time by standard dynamic programmingtechnique:

-   -   P_(0,2t−1), P_(2t+1,4L−2t−1), P_(4L−2t+1,4L), P_(2t+1,2L−1),        P_(2L+1,4L−2t−1;)    -   P_(4L−2t+1,4L−2k−1) and P_(4L−2k+1,4L) for k=0, 1, . . . , t−1;    -   P_(2t+1,4L−2k−1) and P_(4L−2k+1,4L−2t−1) for t+1,t+2, . . . ,        L−1.

Then we compute A_(t) ^((i)), B_(t) ^((i)), and C_(t) ^((i)) for i=1, 2,3, 4. After that, we calculate C′_(j)(θ; {right arrow over (x)}_(egj)),S′_(j)(θ; {right arrow over (x)}_(¬j)) and B′_(j)(θ; {right arrow over(x)}_(¬j)). Overall, this procedure takes O(L) time.

It is to be understood that although the invention has been describedabove in terms of particular embodiments, the foregoing embodiments areprovided as illustrative only, and do not limit or define the scope ofthe invention. Various other embodiments, including but not limited tothe following, are also within the scope of the claims. For example,elements and components described herein may be further divided intoadditional components or joined together to form fewer components forperforming the same functions.

Various physical embodiments of a quantum computer are suitable for useaccording to the present disclosure. In general, the fundamental datastorage unit in quantum computing is the quantum bit, or qubit. Thequbit is a quantum-computing analog of a classical digital computersystem bit. A classical bit is considered to occupy, at any given pointin time, one of two possible states corresponding to the binary digits(bits) 0 or 1. By contrast, a qubit is implemented in hardware by aphysical medium with quantum-mechanical characteristics. Such a medium,which physically instantiates a qubit, may be referred to herein as a“physical instantiation of a qubit,” a “physical embodiment of a qubit,”a “medium embodying a qubit,” or similar terms, or simply as a “qubit,”for ease of explanation. It should be understood, therefore, thatreferences herein to “qubits” within descriptions of embodiments of thepresent invention refer to physical media which embody qubits.

Each qubit has an infinite number of different potentialquantum-mechanical states. When the state of a qubit is physicallymeasured, the measurement produces one of two different basis statesresolved from the state of the qubit. Thus, a single qubit can representa one, a zero, or any quantum superposition of those two qubit states; apair of qubits can be in any quantum superposition of 4 orthogonal basisstates; and three qubits can be in any superposition of 8 orthogonalbasis states. The function that defines the quantum-mechanical states ofa qubit is known as its wavefunction. The wavefunction also specifiesthe probability distribution of outcomes for a given measurement. Aqubit, which has a quantum state of dimension two (i.e., has twoorthogonal basis states), may be generalized to a d-dimensional “qudit,”where d may be any integral value, such as 2, 3, 4, or higher. In thegeneral case of a qudit, measurement of the qudit produces one of ddifferent basis states resolved from the state of the qudit. Anyreference herein to a qubit should be understood to refer more generallyto a d-dimensional qudit with any value of d.

Although certain descriptions of qubits herein may describe such qubitsin terms of their mathematical properties, each such qubit may beimplemented in a physical medium in any of a variety of different ways.Examples of such physical media include superconducting material,trapped ions, photons, optical cavities, individual electrons trappedwithin quantum dots, point defects in solids (e.g., phosphorus donors insilicon or nitrogen-vacancy centers in diamond), molecules (e.g.,alanine, vanadium complexes), or aggregations of any of the foregoingthat exhibit qubit behavior, that is, comprising quantum states andtransitions therebetween that can be controllably induced or detected.

For any given medium that implements a qubit, any of a variety ofproperties of that medium may be chosen to implement the qubit. Forexample, if electrons are chosen to implement qubits, then the xcomponent of its spin degree of freedom may be chosen as the property ofsuch electrons to represent the states of such qubits. Alternatively,the y component, or the z component of the spin degree of freedom may bechosen as the property of such electrons to represent the state of suchqubits. This is merely a specific example of the general feature thatfor any physical medium that is chosen to implement qubits, there may bemultiple physical degrees of freedom (e.g., the x, y, and z componentsin the electron spin example) that may be chosen to represent 0 and 1.For any particular degree of freedom, the physical medium maycontrollably be put in a state of superposition, and measurements maythen be taken in the chosen degree of freedom to obtain readouts ofqubit values.

Certain implementations of quantum computers, referred as gate modelquantum computers, comprise quantum gates. In contrast to classicalgates, there is an infinite number of possible single-qubit quantumgates that change the state vector of a qubit. Changing the state of aqubit state vector typically is referred to as a single-qubit rotation,and may also be referred to herein as a state change or a single-qubitquantum-gate operation. A rotation, state change, or single-qubitquantum-gate operation may be represented mathematically by a unitary2×2 matrix with complex elements. A rotation corresponds to a rotationof a qubit state within its Hilbert space, which may be conceptualizedas a rotation of the Bloch sphere. (As is well-known to those havingordinary skill in the art, the Bloch sphere is a geometricalrepresentation of the space of pure states of a qubit.) Multi-qubitgates alter the quantum state of a set of qubits. For example, two-qubitgates rotate the state of two qubits as a rotation in thefour-dimensional Hilbert space of the two qubits. (As is well-known tothose having ordinary skill in the art, a Hilbert space is an abstractvector space possessing the structure of an inner product that allowslength and angle to be measured. Furthermore, Hilbert spaces arecomplete: there are enough limits in the space to allow the techniquesof calculus to be used.)

A quantum circuit may be specified as a sequence of quantum gates. Asdescribed in more detail below, the term “quantum gate,” as used herein,refers to the application of a gate control signal (defined below) toone or more qubits to cause those qubits to undergo certain physicaltransformations and thereby to implement a logical gate operation. Toconceptualize a quantum circuit, the matrices corresponding to thecomponent quantum gates may be multiplied together in the orderspecified by the gate sequence to produce a 2^(n)×2^(n) complex matrixrepresenting the same overall state change on n qubits. A quantumcircuit may thus be expressed as a single resultant operator. However,designing a quantum circuit in terms of constituent gates allows thedesign to conform to a standard set of gates, and thus enable greaterease of deployment. A quantum circuit thus corresponds to a design foractions taken upon the physical components of a quantum computer.

A given variational quantum circuit may be parameterized in a suitabledevice-specific manner. More generally, the quantum gates making up aquantum circuit may have an associated plurality of tuning parameters.For example, in embodiments based on optical switching, tuningparameters may correspond to the angles of individual optical elements.

In certain embodiments of quantum circuits, the quantum circuit includesboth one or more gates and one or more measurement operations. Quantumcomputers implemented using such quantum circuits are referred to hereinas implementing “measurement feedback.” For example, a quantum computerimplementing measurement feedback may execute the gates in a quantumcircuit and then measure only a subset (i.e., fewer than all) of thequbits in the quantum computer, and then decide which gate(s) to executenext based on the outcome(s) of the measurement(s). In particular, themeasurement(s) may indicate a degree of error in the gate operation(s),and the quantum computer may decide which gate(s) to execute next basedon the degree of error. The quantum computer may then execute thegate(s) indicated by the decision. This process of executing gates,measuring a subset of the qubits, and then deciding which gate(s) toexecute next may be repeated any number of times. Measurement feedbackmay be useful for performing quantum error correction, but is notlimited to use in performing quantum error correction. For every quantumcircuit, there is an error-corrected implementation of the circuit withor without measurement feedback.

Some embodiments described herein generate, measure, or utilize quantumstates that approximate a target quantum state (e.g., a ground state ofa Hamiltonian). As will be appreciated by those trained in the art,there are many ways to quantify how well a first quantum state“approximates” a second quantum state. In the following description, anyconcept or definition of approximation known in the art may be usedwithout departing from the scope hereof. For example, when the first andsecond quantum states are represented as first and second vectors,respectively, the first quantum state approximates the second quantumstate when an inner product between the first and second vectors (calledthe “fidelity” between the two quantum states) is greater than apredefined amount (typically labeled c). In this example, the fidelityquantifies how “close” or “similar” the first and second quantum statesare to each other. The fidelity represents a probability that ameasurement of the first quantum state will give the same result as ifthe measurement were performed on the second quantum state. Proximitybetween quantum states can also be quantified with a distance measure,such as a Euclidean norm, a Hamming distance, or another type of normknown in the art. Proximity between quantum states can also be definedin computational terms. For example, the first quantum stateapproximates the second quantum state when a polynomial time-sampling ofthe first quantum state gives some desired information or property thatit shares with the second quantum state.

Not all quantum computers are gate model quantum computers. Embodimentsof the present invention are not limited to being implemented using gatemodel quantum computers. As an alternative example, embodiments of thepresent invention may be implemented, in whole or in part, using aquantum computer that is implemented using a quantum annealingarchitecture, which is an alternative to the gate model quantumcomputing architecture. More specifically, quantum annealing (QA) is ametaheuristic for finding the global minimum of a given objectivefunction over a given set of candidate solutions (candidate states), bya process using quantum fluctuations.

FIG. 2B shows a diagram illustrating operations typically performed by acomputer system 250 which implements quantum annealing. The system 250includes both a quantum computer 252 and a classical computer 254.Operations shown on the left of the dashed vertical line 256 typicallyare performed by the quantum computer 252, while operations shown on theright of the dashed vertical line 256 typically are performed by theclassical computer 254.

Quantum annealing starts with the classical computer 254 generating aninitial Hamiltonian 260 and a final Hamiltonian 262 based on acomputational problem 258 to be solved, and providing the initialHamiltonian 260, the fmal Hamiltonian 262 and an annealing schedule 270as input to the quantum computer 252. The quantum computer 252 preparesa well-known initial state 266 (FIG. 2B, operation 264), such as aquantum-mechanical superposition of all possible states (candidatestates) with equal weights, based on the initial Hamiltonian 260. Theclassical computer 254 provides the initial Hamiltonian 260, a fmalHamiltonian 262, and an annealing schedule 270 to the quantum computer252. The quantum computer 252 starts in the initial state 266, andevolves its state according to the annealing schedule 270 following thetime-dependent Schrödinger equation, a natural quantum-mechanicalevolution of physical systems (FIG. 2B, operation 268). Morespecifically, the state of the quantum computer 252 undergoes timeevolution under a time-dependent Hamiltonian, which starts from theinitial Hamiltonian 260 and terminates at the fmal Hamiltonian 262. Ifthe rate of change of the system Hamiltonian is slow enough, the systemstays close to the ground state of the instantaneous Hamiltonian. If therate of change of the system Hamiltonian is accelerated, the system mayleave the ground state temporarily but produce a higher likelihood ofconcluding in the ground state of the final problem Hamiltonian, i.e.,diabatic quantum computation. At the end of the time evolution, the setof qubits on the quantum annealer is in a final state 272, which isexpected to be close to the ground state of the classical Ising modelthat corresponds to the solution to the original computational problem258. An experimental demonstration of the success of quantum annealingfor random magnets was reported immediately after the initialtheoretical proposal.

The final state 272 of the quantum computer 252 is measured, therebyproducing results 276 (i.e., measurements) (FIG. 2B, operation 274). Themeasurement operation 274 may be performed, for example, in any of theways disclosed herein, such as in any of the ways disclosed herein inconnection with the measurement unit 110 in FIG. 1 . The classicalcomputer 254 performs postprocessing on the measurement results 276 toproduce output 280 representing a solution to the original computationalproblem 258 (FIG. 2B, operation 278).

As yet another alternative example, embodiments of the present inventionmay be implemented, in whole or in part, using a quantum computer thatis implemented using a one-way quantum computing architecture, alsoreferred to as a measurement-based quantum computing architecture, whichis another alternative to the gate model quantum computing architecture.More specifically, the one-way or measurement based quantum computer(MBQC) is a method of quantum computing that first prepares an entangledresource state, usually a cluster state or graph state, then performssingle qubit measurements on it. It is “one-way” because the resourcestate is destroyed by the measurements.

The outcome of each individual measurement is random, but they arerelated in such a way that the computation always succeeds. In generalthe choices of basis for later measurements need to depend on theresults of earlier measurements, and hence the measurements cannot allbe performed at the same time.

Any of the functions disclosed herein may be implemented using means forperforming those functions. Such means include, but are not limited to,any of the components disclosed herein, such as the computer-relatedcomponents described below.

Referring to FIG. 1 , a diagram is shown of a system 100 implementedaccording to one embodiment of the present invention. Referring to FIG.2A, a flowchart is shown of a method 200 performed by the system 100 ofFIG. 1 according to one embodiment of the present invention. The system100 includes a quantum computer 102. The quantum computer 102 includes aplurality of qubits 104, which may be implemented in any of the waysdisclosed herein. There may be any number of qubits 104 in the quantumcomputer 102. For example, the qubits 104 may include or consist of nomore than 2 qubits, no more than 4 qubits, no more than 8 qubits, nomore than 16 qubits, no more than 32 qubits, no more than 64 qubits, nomore than 128 qubits, no more than 256 qubits, no more than 512 qubits,no more than 1024 qubits, no more than 2048 qubits, no more than 4096qubits, or no more than 8192 qubits. These are merely examples, inpractice there may be any number of qubits 104 in the quantum computer102.

There may be any number of gates in a quantum circuit. However, in someembodiments the number of gates may be at least proportional to thenumber of qubits 104 in the quantum computer 102. In some embodimentsthe gate depth may be no greater than the number of qubits 104 in thequantum computer 102, or no greater than some linear multiple of thenumber of qubits 104 in the quantum computer 102 (e.g., 2, 3, 4, 5, 6,or 7).

The qubits 104 may be interconnected in any graph pattern. For example,they be connected in a linear chain, a two-dimensional grid, anall-to-all connection, any combination thereof, or any subgraph of anyof the preceding.

As will become clear from the description below, although element 102 isreferred to herein as a “quantum computer,” this does not imply that allcomponents of the quantum computer 102 leverage quantum phenomena. Oneor more components of the quantum computer 102 may, for example, beclassical (i.e., non-quantum components) components which do notleverage quantum phenomena.

The quantum computer 102 includes a control unit 106, which may includeany of a variety of circuitry and/or other machinery for performing thefunctions disclosed herein. The control unit 106 may, for example,consist entirely of classical components. The control unit 106 generatesand provides as output one or more control signals 108 to the qubits104. The control signals 108 may take any of a variety of forms, such asany kind of electromagnetic signals, such as electrical signals,magnetic signals, optical signals (e.g., laser pulses), or anycombination thereof.

For example:

-   -   In embodiments in which some or all of the qubits 104 are        implemented as photons (also referred to as a “quantum optical”        implementation) that travel along waveguides, the control unit        106 may be a beam splitter (e.g., a heater or a mirror), the        control signals 108 may be signals that control the heater or        the rotation of the mirror, the measurement unit 110 may be a        photodetector, and the measurement signals 112 may be photons.    -   In embodiments in which some or all of the qubits 104 are        implemented as charge type qubits (e.g., transmon, X-mon, G-mon)        or flux-type qubits (e.g., flux qubits, capacitively shunted        flux qubits) (also referred to as a “circuit quantum        electrodynamic” (circuit QED) implementation), the control unit        106 may be a bus resonator activated by a drive, the control        signals 108 may be cavity modes, the measurement unit 110 may be        a second resonator (e.g., a low-Q resonator), and the        measurement signals 112 may be voltages measured from the second        resonator using dispersive readout techniques.    -   In embodiments in which some or all of the qubits 104 are        implemented as superconducting circuits, the control unit 106        may be a circuit QED-assisted control unit or a direct        capacitive coupling control unit or an inductive capacitive        coupling control unit, the control signals 108 may be cavity        modes, the measurement unit 110 may be a second resonator (e.g.,        a low-Q resonator), and the measurement signals 112 may be        voltages measured from the second resonator using dispersive        readout techniques.    -   In embodiments in which some or all of the qubits 104 are        implemented as trapped ions (e.g., electronic states of, e.g.,        magnesium ions), the control unit 106 may be a laser, the        control signals 108 may be laser pulses, the measurement unit        110 may be a laser and either a CCD or a photodetector (e.g., a        photomultiplier tube), and the measurement signals 112 may be        photons.    -   In embodiments in which some or all of the qubits 104 are        implemented using nuclear magnetic resonance (NMR) (in which        case the qubits may be molecules, e.g., in liquid or solid        form), the control unit 106 may be a radio frequency (RF)        antenna, the control signals 108 may be RF fields emitted by the        RF antenna, the measurement unit 110 may be another RF antenna,        and the measurement signals 112 may be RF fields measured by the        second RF antenna.    -   In embodiments in which some or all of the qubits 104 are        implemented as nitrogen-vacancy centers (NV centers), the        control unit 106 may, for example, be a laser, a microwave        antenna, or a coil, the control signals 108 may be visible        light, a microwave signal, or a constant electromagnetic field,        the measurement unit 110 may be a photodetector, and the        measurement signals 112 may be photons.    -   In embodiments in which some or all of the qubits 104 are        implemented as two-dimensional quasiparticles called “anyons”        (also referred to as a “topological quantum computer”        implementation), the control unit 106 may be nanowires, the        control signals 108 may be local electrical fields or microwave        pulses, the measurement unit 110 may be superconducting        circuits, and the measurement signals 112 may be voltages.    -   In embodiments in which some or all of the qubits 104 are        implemented as semiconducting material (e.g., nanowires), the        control unit 106 may be microfabricated gates, the control        signals 108 may be RF or microwave signals, the measurement unit        110 may be microfabricated gates, and the measurement signals        112 may be RF or microwave signals.

Although not shown explicitly in FIG. 1 and not required, themeasurement unit 110 may provide one or more feedback signals 114 to thecontrol unit 106 based on the measurement signals 112. For example,quantum computers referred to as “one-way quantum computers” or“measurement-based quantum computers” utilize such feedback signal 114from the measurement unit 110 to the control unit 106. Such feedbacksignal 114 is also necessary for the operation of fault-tolerant quantumcomputing and error correction.

The control signals 108 may, for example, include one or more statepreparation signals which, when received by the qubits 104, cause someor all of the qubits 104 to change their states. Such state preparationsignals constitute a quantum circuit also referred to as an “ansatzcircuit.” The resulting state of the qubits 104 is referred to herein asan “initial state” or an “ansatz state.” The process of outputting thestate preparation signal(s) to cause the qubits 104 to be in theirinitial state is referred to herein as “state preparation” (FIG. 2A,section 206). A special case of state preparation is “initialization,”also referred to as a “reset operation,” in which the initial state isone in which some or all of the qubits 104 are in the “zero” state i.e.the default single-qubit state. More generally, state preparation mayinvolve using the state preparation signals to cause some or all of thequbits 104 to be in any distribution of desired states. In someembodiments, the control unit 106 may first perform initialization onthe qubits 104 and then perform preparation on the qubits 104, by firstoutputting a first set of state preparation signals to initialize thequbits 104, and by then outputting a second set of state preparationsignals to put the qubits 104 partially or entirely into non-zerostates.

Another example of control signals 108 that may be output by the controlunit 106 and received by the qubits 104 are gate control signals. Thecontrol unit 106 may output such gate control signals, thereby applyingone or more gates to the qubits 104. Applying a gate to one or morequbits causes the set of qubits to undergo a physical state change whichembodies a corresponding logical gate operation (e.g., single-qubitrotation, two-qubit entangling gate or multi-qubit operation) specifiedby the received gate control signal. As this implies, in response toreceiving the gate control signals, the qubits 104 undergo physicaltransformations which cause the qubits 104 to change state in such a waythat the states of the qubits 104, when measured (see below), representthe results of performing logical gate operations specified by the gatecontrol signals. The term “quantum gate,” as used herein, refers to theapplication of a gate control signal to one or more qubits to causethose qubits to undergo the physical transformations described above andthereby to implement a logical gate operation.

It should be understood that the dividing line between state preparation(and the corresponding state preparation signals) and the application ofgates (and the corresponding gate control signals) may be chosenarbitrarily. For example, some or all the components and operations thatare illustrated in FIGS. 1 and 2A-2B as elements of “state preparation”may instead be characterized as elements of gate application.Conversely, for example, some or all of the components and operationsthat are illustrated in FIGS. 1 and 2A-2B as elements of “gateapplication” may instead be characterized as elements of statepreparation. As one particular example, the system and method of FIGS. 1and 2A-2B may be characterized as solely performing state preparationfollowed by measurement, without any gate application, where theelements that are described herein as being part of gate application areinstead considered to be part of state preparation. Conversely, forexample, the system and method of FIGS. 1 and 2A-2B may be characterizedas solely performing gate application followed by measurement, withoutany state preparation, and where the elements that are described hereinas being part of state preparation are instead considered to be part ofgate application.

The quantum computer 102 also includes a measurement unit 110, whichperforms one or more measurement operations on the qubits 104 to readout measurement signals 112 (also referred to herein as “measurementresults”) from the qubits 104, where the measurement results 112 aresignals representing the states of some or all of the qubits 104. Inpractice, the control unit 106 and the measurement unit 110 may beentirely distinct from each other, or contain some components in commonwith each other, or be implemented using a single unit (i.e., a singleunit may implement both the control unit 106 and the measurement unit110). For example, a laser unit may be used both to generate the controlsignals 108 and to provide stimulus (e.g., one or more laser beams) tothe qubits 104 to cause the measurement signals 112 to be generated.

In general, the quantum computer 102 may perform various operationsdescribed above any number of times. For example, the control unit 106may generate one or more control signals 108, thereby causing the qubits104 to perform one or more quantum gate operations. The measurement unit110 may then perform one or more measurement operations on the qubits104 to read out a set of one or more measurement signals 112. Themeasurement unit 110 may repeat such measurement operations on thequbits 104 before the control unit 106 generates additional controlsignals 108, thereby causing the measurement unit 110 to read outadditional measurement signals 112 resulting from the same gateoperations that were performed before reading out the previousmeasurement signals 112. The measurement unit 110 may repeat thisprocess any number of times to generate any number of measurementsignals 112 corresponding to the same gate operations. The quantumcomputer 102 may then aggregate such multiple measurements of the samegate operations in any of a variety of ways.

After the measurement unit 110 has performed one or more measurementoperations on the qubits 104 after they have performed one set of gateoperations, the control unit 106 may generate one or more additionalcontrol signals 108, which may differ from the previous control signals108, thereby causing the qubits 104 to perform one or more additionalquantum gate operations, which may differ from the previous set ofquantum gate operations. The process described above may then berepeated, with the measurement unit 110 performing one or moremeasurement operations on the qubits 104 in their new states (resultingfrom the most recently-performed gate operations).

In general, the system 100 may implement a plurality of quantum circuitsas follows. For each quantum circuit C in the plurality of quantumcircuits (FIG. 2A, operation 202), the system 100 performs a pluralityof “shots” on the qubits 104. The meaning of a shot will become clearfrom the description that follows. For each shot S in the plurality ofshots (FIG. 2A, operation 204), the system 100 prepares the state of thequbits 104 (FIG. 2A, section 206). More specifically, for each quantumgate G in quantum circuit C (FIG. 2A, operation 210), the system 100applies quantum gate G to the qubits 104 (FIG. 2A, operations 212 and214).

Then, for each of the qubits Q 104 (FIG. 2A, operation 216), the system100 measures the qubit Q to produce measurement output representing acurrent state of qubit Q (FIG. 2A, operations 218 and 220).

The operations described above are repeated for each shot S (FIG. 2A,operation 222), and circuit C (FIG. 2A, operation 224). As thedescription above implies, a single “shot” involves preparing the stateof the qubits 104 and applying all of the quantum gates in a circuit tothe qubits 104 and then measuring the states of the qubits 104; and thesystem 100 may perform multiple shots for one or more circuits.

Referring to FIG. 3 , a diagram is shown of a hybrid quantum-classical(HQC) computer 300 implemented according to one embodiment of thepresent invention. The HQC 300 includes a quantum computer component 102(which may, for example, be implemented in the manner shown anddescribed in connection with FIG. 1 ) and a classical computer component306. The classical computer component may be a machine implementedaccording to the general computing model established by John VonNeumann, in which programs are written in the form of ordered lists ofinstructions and stored within a classical (e.g., digital) memory 310and executed by a classical (e.g., digital) processor 308 of theclassical computer. The memory 310 is classical in the sense that itstores data in a storage medium in the form of bits, which have a singledefinite binary state at any point in time. The bits stored in thememory 310 may, for example, represent a computer program. The classicalcomputer component 304 typically includes a bus 314. The processor 308may read bits from and write bits to the memory 310 over the bus 314.For example, the processor 308 may read instructions from the computerprogram in the memory 310, and may optionally receive input data 316from a source external to the computer 302, such as from a user inputdevice such as a mouse, keyboard, or any other input device. Theprocessor 308 may use instructions that have been read from the memory310 to perform computations on data read from the memory 310 and/or theinput 316, and generate output from those instructions. The processor308 may store that output back into the memory 310 and/or provide theoutput externally as output data 318 via an output device, such as amonitor, speaker, or network device.

The quantum computer component 102 may include a plurality of qubits104, as described above in connection with FIG. 1 . A single qubit mayrepresent a one, a zero, or any quantum superposition of those two qubitstates. The classical computer component 304 may provide classical statepreparation signals Y32 to the quantum computer 102, in response towhich the quantum computer 102 may prepare the states of the qubits 104in any of the ways disclosed herein, such as in any of the waysdisclosed in connection with FIGS. 1 and 2A-2B.

Once the qubits 104 have been prepared, the classical processor 308 mayprovide classical control signals Y34 to the quantum computer 102, inresponse to which the quantum computer 102 may apply the gate operationsspecified by the control signals Y32 to the qubits 104, as a result ofwhich the qubits 104 arrive at a final state. The measurement unit 110in the quantum computer 102 (which may be implemented as described abovein connection with FIGS. 1 and 2A-2B) may measure the states of thequbits 104 and produce measurement output Y38 representing the collapseof the states of the qubits 104 into one of their eigenstates. As aresult, the measurement output Y38 includes or consists of bits andtherefore represents a classical state. The quantum computer 102provides the measurement output Y38 to the classical processor 308. Theclassical processor 308 may store data representing the measurementoutput Y38 and/or data derived therefrom in the classical memory 310.

The steps described above may be repeated any number of times, with whatis described above as the final state of the qubits 104 serving as theinitial state of the next iteration. In this way, the classical computer304 and the quantum computer 102 may cooperate as co-processors toperform joint computations as a single computer system.

Although certain functions may be described herein as being performed bya classical computer and other functions may be described herein asbeing performed by a quantum computer, these are merely examples and donot constitute limitations of the present invention. A subset of thefunctions which are disclosed herein as being performed by a quantumcomputer may instead be performed by a classical computer. For example,a classical computer may execute functionality for emulating a quantumcomputer and provide a subset of the functionality described herein,albeit with functionality limited by the exponential scaling of thesimulation. Functions which are disclosed herein as being performed by aclassical computer may instead be performed by a quantum computer.

The techniques described above may be implemented, for example, inhardware, in one or more computer programs tangibly stored on one ormore computer-readable media, firmware, or any combination thereof, suchas solely on a quantum computer, solely on a classical computer, or on ahybrid quantum-classical (HQC) computer. The techniques disclosed hereinmay, for example, be implemented solely on a classical computer, inwhich the classical computer emulates the quantum computer functionsdisclosed herein.

The techniques described above may be implemented in one or morecomputer programs executing on (or executable by) a programmablecomputer (such as a classical computer, a quantum computer, or an HQC)including any combination of any number of the following: a processor, astorage medium readable and/or writable by the processor (including, forexample, volatile and non-volatile memory and/or storage elements), aninput device, and an output device. Program code may be applied to inputentered using the input device to perform the functions described and togenerate output using the output device.

Embodiments of the present invention include features which are onlypossible and/or feasible to implement with the use of one or morecomputers, computer processors, and/or other elements of a computersystem. Such features are either impossible or impractical to implementmentally and/or manually. For example, it would be impossible tomentally or manually generate random samples from a complex distributiondescribing a operator P and state |s>.

Any claims herein which affirmatively require a computer, a processor, amemory, or similar computer-related elements, are intended to requiresuch elements, and should not be interpreted as if such elements are notpresent in or required by such claims. Such claims are not intended, andshould not be interpreted, to cover methods and/or systems which lackthe recited computer-related elements. For example, any method claimherein which recites that the claimed method is performed by a computer,a processor, a memory, and/or similar computer-related element, isintended to, and should only be interpreted to, encompass methods whichare performed by the recited computer-related element(s). Such a methodclaim should not be interpreted, for example, to encompass a method thatis performed mentally or by hand (e.g., using pencil and paper).Similarly, any product claim herein which recites that the claimedproduct includes a computer, a processor, a memory, and/or similarcomputer-related element, is intended to, and should only be interpretedto, encompass products which include the recited computer-relatedelement(s). Such a product claim should not be interpreted, for example,to encompass a product that does not include the recitedcomputer-related element(s).

In embodiments in which a classical computing component executes acomputer program providing any subset of the functionality within thescope of the claims below, the computer program may be implemented inany programming language, such as assembly language, machine language, ahigh-level procedural programming language, or an object-orientedprogramming language. The programming language may, for example, be acompiled or interpreted programming language.

Each such computer program may be implemented in a computer programproduct tangibly embodied in a machine-readable storage device forexecution by a computer processor, which may be either a classicalprocessor or a quantum processor. Method steps of the invention may beperformed by one or more computer processors executing a programtangibly embodied on a computer-readable medium to perform functions ofthe invention by operating on input and generating output. Suitableprocessors include, by way of example, both general and special purposemicroprocessors. Generally, the processor receives (reads) instructionsand data from a memory (such as a read-only memory and/or a randomaccess memory) and writes (stores) instructions and data to the memory.Storage devices suitable for tangibly embodying computer programinstructions and data include, for example, all forms of non-volatilememory, such as semiconductor memory devices, including EPROM, EEPROM,and flash memory devices; magnetic disks such as internal hard disks andremovable disks; magneto-optical disks; and CD-ROMs. Any of theforegoing may be supplemented by, or incorporated in, specially-designedASICs (application-specific integrated circuits) or FPGAs(Field-Programmable Gate Arrays). A classical computer can generallyalso receive (read) programs and data from, and write (store) programsand data to, a non-transitory computer-readable storage medium such asan internal disk (not shown) or a removable disk. These elements willalso be found in a conventional desktop or workstation computer as wellas other computers suitable for executing computer programs implementingthe methods described herein, which may be used in conjunction with anydigital print engine or marking engine, display monitor, or other rasteroutput device capable of producing color or gray scale pixels on paper,film, display screen, or other output medium.

Any data disclosed herein may be implemented, for example, in one ormore data structures tangibly stored on a non-transitorycomputer-readable medium (such as a classical computer-readable medium,a quantum computer-readable medium, or an HQC computer-readable medium).Embodiments of the invention may store such data in such datastructure(s) and read such data from such data structure(s).

What is claimed is:
 1. A method for improved quantum amplitudeestimation for reducing a number of measurements to generate a statisticaccurately, comprising: selecting, with a classical computer, aplurality of quantum-circuit-parameter values to optimize anaccuracy-improvement rate of the statistic estimating an expectationvalue

s|P|s

of an observable P with respect to a quantum state |s

; applying, to one or more qubits of a quantum computer, a sequence ofalternating first and second generalized reflection operators totransform the one or more qubits from the quantum state |s

into a reflected quantum state, each of the first and second generalizedreflection operators being controlled according to a corresponding oneof the plurality of quantum-circuit-parameter values; measuring theplurality of qubits in the reflected quantum state with respect to theobservable P to obtain a set of measurement outcomes; and updating, onthe classical computer, the statistic with the set of measurementoutcomes.
 2. The method of claim 1, further comprising outputting thestatistic after said updating.
 3. The method of claim 1, the statisticcomprising a mean.
 4. The method of claim 1, the accuracy-improvementrate comprising a variance-reduction factor.
 5. The method of claim 1,the accuracy-improvement rate comprising an information-improvementrate.
 6. The method of claim 5, the information-improvement ratecomprising one or a Fisher-information-improvement rate and anentropy-reduction rate.
 7. The method of claim 1, wherein the sequenceof first and second generalized reflection operators and the observablep define a bias of an engineered likelihood function.
 8. The method ofclaim 1, further comprising iterating said selecting, applying,measuring, and updating.
 9. The method of claim 1, further comprising:updating, on the classical computer and with the set of measurementoutcomes, an accuracy estimate of the statistic; and iterating saidselecting, applying, measuring, and updating while the accuracy estimateis greater than a threshold.
 10. The method of claim 1, wherein saidupdating the statistic comprises: updating a prior distribution with theplurality of measurements to obtain a posterior distribution; andcalculating an updated statistic from the posterior distribution. 11.The method of claim 1, wherein said selecting is based on the statisticand an accuracy estimate of the statistic.
 12. The method of claim 11,wherein said selecting is further based on a fidelity representingerrors occurring during said applying and measuring.
 13. The method ofclaim 1, wherein said selecting uses one of coordinate ascent andgradient descent.
 14. A computing system for improved quantum amplitudeestimation for reducing a number of measurements to generate a statisticaccurately, the computing system comprising: a processor; aquantum-classical interface communicably coupling the computing systemwith a quantum computer; and a memory communicably coupled with theprocessor, the memory storing machine-readable instructions that, whenexecuted by the processor, control the computing system to: (i) select aplurality of quantum-circuit-parameter values to optimize anaccuracy-improvement rate of the statistic estimating an expectationvalue

s|P|s

of an observable P with respect to a quantum state |s

, (ii) control the quantum computer, via the quantum-classicalinterface, to transform one or more qubits of the quantum computer fromthe quantum state |s

into a reflected quantum state using a sequence of alternating first andsecond generalized reflection operators, each of the first and secondgeneralized reflection operators being controlled according to acorresponding one of the plurality of quantum-circuit-parameter values,(iii) control the quantum computer, via the quantum-classical interface,to measure the plurality of qubits in the reflected quantum state withrespect to the observable P to obtain a set of measurement outcomes, and(iv) update the statistic with the set of measurement outcomes.
 15. Thecomputing system of claim 14, the memory storing additionalmachine-readable instructions that, when executed by the processor,control the computing system to output the statistic.
 16. The computingsystem of claim 14, further comprising the quantum computer.